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"Preface to the Instructor 


You are probably about to teach a course that will give students 
their second exposure to linear algebra. During their first brush with 
the subject, your students probably worked with Euclidean spaces and 
matrices. In contrast, this course will emphasize abstract vector spaces 
and linear maps. 

The audacious title of this book deserves an explanation. Almost 
all linear algebra books use determinants to prove that every linear op¬ 
erator on a finite-dimensional complex vector space has an eigenvalue. 
Determinants are difficult, nonintuitive, and often defined without mo¬ 
tivation. To prove the theorem about existence of eigenvalues on com¬ 
plex vector spaces, most books must define determinants, prove that a 
linear map is not invertible if and only if its determinant equals 0, and 
then define the characteristic polynomial. This tortuous (torturous?) 
path gives students little feeling for why eigenvalues must exist. 

In contrast, the simple determinant-free proofs presented here of¬ 
fer more insight. Once determinants have been banished to the end 
of the book, a new route opens to the main goal of linear algebra- 
understanding the structure of linear operators. 

This book starts at the beginning of the subject, with no prerequi¬ 
sites other than the usual demand for suitable mathematical maturity. 
Even if your students have already seen some of the material in the 
first few chapters, they may be unaccustomed to working exercises of 
the type presented here, most of which require an understanding of 
proofs. 

• Vector spaces are defined in Chapter 1, and their basic properties 
are developed. 

• Linear independence, span, basis, and dimension are defined in 
Chapter 2, which presents the basic theory of finite-dimensional 
vector spaces. 


IX 
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• Linear maps are introduced in Chapter 3. The key result here 
is that for a linear map T, the dimension of the null space of T 
plus the dimension of the range of T equals the dimension of the 
domain of T. 

• The part of the theory of polynomials that will be needed to un¬ 
derstand linear operators is presented in Chapter 4. If you take 
class time going through the proofs in this chapter (which con¬ 
tains no linear algebra), then you probably will not have time to 
cover some important aspects of linear algebra. Your students 
will already be familiar with the theorems about polynomials in 
this chapter, so you can ask them to read the statements of the 
results but not the proofs. The curious students will read some 
of the proofs anyway, which is why they are included in the text. 

• The idea of studying a linear operator by restricting it to small 
subspaces leads in Chapter 5 to eigenvectors. The highlight of the 
chapter is a simple proof that on complex vector spaces, eigenval¬ 
ues always exist. This result is then used to show that each linear 
operator on a complex vector space has an upper-triangular ma¬ 
trix with respect to some basis. Similar techniques are used to 
show that every linear operator on a real vector space has an in¬ 
variant subspace of dimension 1 or 2. This result is used to prove 
that every linear operator on an odd-dimensional real vector space 
has an eigenvalue. All this is done without defining determinants 
or characteristic polynomials! 

• Inner-product spaces are defined in Chapter 6, and their basic 
properties are developed along with standard tools such as ortho¬ 
normal bases, the Gram-Schmidt procedure, and adjoints. This 
chapter also shows how orthogonal projections can be used to 
solve certain minimization problems. 

• The spectral theorem, which characterizes the linear operators for 
which there exists an orthonormal basis consisting of eigenvec¬ 
tors, is the highlight of Chapter 7. The work in earlier chapters 
pays off here with especially simple proofs. This chapter also 
deals with positive operators, linear isometries, the polar decom¬ 
position, and the singular-value decomposition. 
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XI 


• The minimal polynomial, characteristic polynomial, and general¬ 
ized eigenvectors are introduced in Chapter 8. The main achieve¬ 
ment of this chapter is the description of a linear operator on 
a complex vector space in terms of its generalized eigenvectors. 
This description enables one to prove almost all the results usu¬ 
ally proved using Jordan form. For example, these tools are used 
to prove that every invertible linear operator on a complex vector 
space has a square root. The chapter concludes with a proof that 
every linear operator on a complex vector space can be put into 
Jordan form. 

• Linear operators on real vector spaces occupy center stage in 
Chapter 9. Here two-dimensional invariant subspaces make up 
for the possible lack of eigenvalues, leading to results analogous 
to those obtained on complex vector spaces. 

• The trace and determinant are defined in Chapter 10 in terms 
of the characteristic polynomial (defined earlier without determi¬ 
nants). On complex vector spaces, these definitions can be re¬ 
stated: the trace is the sum of the eigenvalues and the determi¬ 
nant is the product of the eigenvalues (both counting multiplic¬ 
ity). These easy-to-remember definitions would not be possible 
with the traditional approach to eigenvalues because that method 
uses determinants to prove that eigenvalues exist. The standard 
theorems about determinants now become much clearer. The po¬ 
lar decomposition and the characterization of self-adjoint opera¬ 
tors are used to derive the change of variables formula for multi- 
variable integrals in a fashion that makes the appearance of the 
determinant there seem natural. 

This book usually develops linear algebra simultaneously for real 
and complex vector spaces by letting F denote either the real or the 
complex numbers. Abstract fields could be used instead, but to do so 
would introduce extra abstraction without leading to any new linear al¬ 
gebra. Another reason for restricting attention to the real and complex 
numbers is that polynomials can then be thought of as genuine func¬ 
tions instead of the more formal objects needed for polynomials with 
coefficients in finite fields. Finally, even if the beginning part of the the¬ 
ory were developed with arbitrary fields, inner-product spaces would 
push consideration back to just real and complex vector spaces. 
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Even in a book as short as this one, you cannot expect to cover every¬ 
thing. Going through the first eight chapters is an ambitious goal for a 
one-semester course. If you must reach Chapter 10, then I suggest cov¬ 
ering Chapters 1,2, and 4 quickly (students may have seen this material 
in earlier courses) and skipping Chapter 9 (in which case you should 
discuss trace and determinants only on complex vector spaces). 

A goal more important than teaching any particular set of theorems 
is to develop in students the ability to understand and manipulate the 
objects of linear algebra. Mathematics can be learned only by doing; 
fortunately, linear algebra has many good homework problems. When 
teaching this course, I usually assign two or three of the exercises each 
class, due the next class. Going over the homework might take up a 
third or even half of a typical class. 

A solutions manual for all the exercises is available (without charge) 
only to instructors who are using this book as a textbook. To obtain 
the solutions manual, instructors should send an e-mail request to me 
(or contact Springer if I am no longer around). 

Please check my web site for a list of errata (which I hope will be 
empty or almost empty) and other information about this book. 

I would greatly appreciate hearing about any errors in this book, 
even minor ones. I welcome your suggestions for improvements, even 
tiny ones. Please feel free to contact me. 

Have fun! 

Sheldon Axler 
Mathematics Department 
San Francisco State University 
San Francisco, CA 94132, USA 

e-mail: axler@math.sfsu.edu 

www home page: http: //math. sfsu. edu/axl er 



"Preface to the Student 


You are probably about to begin your second exposure to linear al¬ 
gebra. Unlike your first brush with the subject, which probably empha¬ 
sized Euclidean spaces and matrices, we will focus on abstract vector 
spaces and linear maps. These terms will be defined later, so don’t 
worry if you don’t know what they mean. This book starts from the be¬ 
ginning of the subject, assuming no knowledge of linear algebra. The 
key point is that you are about to immerse yourself in serious math¬ 
ematics, with an emphasis on your attaining a deep understanding of 
the definitions, theorems, and proofs. 

You cannot expect to read mathematics the way you read a novel. If 
you zip through a page in less than an hour, you are probably going too 
fast. When you encounter the phrase “as you should verify”, you should 
indeed do the verification, which will usually require some writing on 
your part. When steps are left out, you need to supply the missing 
pieces. You should ponder and internalize each definition. For each 
theorem, you should seek examples to show why each hypothesis is 
necessary. 

Please check my web site for a list of errata (which I hope will be 
empty or almost empty) and other information about this book. 

I would greatly appreciate hearing about any errors in this book, 
even minor ones. I welcome your suggestions for improvements, even 
tiny ones. 

Have fun! 

Sheldon Axler 
Mathematics Department 
San Francisco State University 
San Francisco, CA 94132, USA 

e-mail: axler@math.sfsu.edu 

www home page: http: //math. sfsu. edu/axl er 
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Chapter 1 


'Vector Spaces 


Linear algebra is the study of linear maps on finite-dimensional vec¬ 
tor spaces. Eventually we will learn what all these terms mean. In this 
chapter we will define vector spaces and discuss their elementary prop¬ 
erties. 

In some areas of mathematics, including linear algebra, better the¬ 
orems and more insight emerge if complex numbers are investigated 
along with real numbers. Thus we begin by introducing the complex 
numbers and their basic properties. 
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Chapter 1. Vector Spaces 


The symbol i was first 
used to denote V-T by 
the Swiss 
mathematician 
Leonhard Euler in 1777. 


Complex Numbers 

You should already be fa mi liar with the basic properties of the set R 
of real numbers. Complex numbers were invented so that we can take 
square roots of negative numbers. The key idea is to assume we have 
a square root of -1, denoted i, and manipulate it using the usual rules 
of arithmetic. Formally, a complex number is an ordered pair (a, b), 
where a,fce R, but we will write this as a + bi. The set of all complex 
numbers is denoted by C: 

C = {a + bi: a,b ^ R}. 

If a G R, we identify a + Oi with the real number a. Thus we can think 
of R as a subset of C. 

Addition and multiplication on C are defined by 

(a + bi) + (c + di) = (a + c) + (b + d)i, 

(a + bi)(c + di) = (ac - bd) + (ad + bc)i\ 

here a, b,c,d G R. Using multiplication as defined above, you should 
verify that i 2 = -1. Do not memorize the formula for the product 
of two complex numbers; you can always rederive it by recalling that 
i 2 = -1 and then using the usual rules of arithmetic. 

You should verify, using the familiar properties of the real num¬ 
bers, that addition and multiplication on C satisfy the following prop¬ 
erties: 

commutativity 

w + z = z + w and wz = zw for all w, zeC; 

associativity 

(Zi + z 2 ) + z 3 = Zi + (z 2 + z 3 ) and (ziz 2 )z 3 = zi(z 2 z 3 ) for all 
Z\, z 2 , z 3 G C; 

identities 

z + 0 = z and zl = z for all zeC; 

additive inverse 

for every zeC, there exists a unique w G C such that z + w = 0; 

multiplicative inverse 

for every z G C with z ^ 0, there exists a unique w e C such that 
zw = 1; 
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distributive property 

A(w + z) = Aw + Az for all A, w, zeC. 

For z e C, we let -z denote the additive inverse of z. Thus -z is 
the unique complex number such that 

z + (—z) = 0. 

Subtraction on C is defined by 

w - z = w + (—z) 


for w, z G C. 

For z G C with z 4 1 0, we let 1/z denote the multiplicative inverse 
of z. Thus 1/z is the unique complex number such that 

z (1 / z) = 1. 


Division on C is defined by 


w/z = w(l/z) 


for w,z G C with z ^ 0. 

So that we can conveniently make definitions and prove theorems 
that apply to both real and complex numbers, we adopt the following 
notation: 


Throughout this book, 

F stands for either R or C. 


Thus if we prove a theorem involving F, we will know that it holds when 
F is replaced with R and when F is replaced with C. Elements of F are 
called scalars. The word “scalar”, which means number, is often used 
when we want to emphasize that an object is a number, as opposed to 
a vector (vectors will be defined soon). 

For z g F and m a positive integer, we define z m to denote the 
product of z with itself m times: 



m times 


Clearly (z m ) n = z mn and (wz) m = w m z m for all w,z G F and all 
positive integers m,n. 


The letter F is used 
because R and C are 
examples of what are 
called fields. In this 
book we will not need 
to deal with fields other 
than R or C. Many of 
the definitions, 
theorems, and proofs 
in linear algebra that 
work for both R and C 
also work without 
change if an arbitrary 
field replaces R or C. 
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Definition of Sector Space 

Before defining what a vector space is, let’s look at two important 
examples. The vector space R 2 , which you can think of as a plane, 
consists of all ordered pairs of real numbers: 

R 2 = {(x,y) : x,y e R}. 

The vector space R 3 , which you can think of as ordinary space, consists 
of all ordered triples of real numbers: 

R 3 = {(x,y,z) : x,y,z e R}. 

To generalize R 2 and R 3 to higher dimensions, we first need to dis¬ 
cuss the concept of lists. Suppose n is a nonnegative integer. A list of 
length n is an ordered collection of n objects (which might be num¬ 
bers, other lists, or more abstract entities) separated by commas and 
Many mathematicians surrounded by parentheses. A list of length n looks like this: 
call a list of length n an 

n-tuple. (Xi,... ,X n ). 

Thus a list of length 2 is an ordered pair and a list of length 3 is an 
ordered triple. For j e {1,... ,n}, we say that xj is the j th coordinate 
of the list above. Thus xi is called the first coordinate, X 2 is called the 
second coordinate, and so on. 

Sometimes we will use the word list without specifying its length. 
Remember, however, that by definition each list has a finite length that 
is a nonnegative integer, so that an object that looks like 

(Xi,X2,...), 

which might be said to have infinite length, is not a list. A list of length 
0 looks like this: (). We consider such an object to be a list so that 
some of our theorems will not have trivial exceptions. 

Two lists are equal if and only if they have the same length and 
the same coordinates in the same order. In other words, (x\,..., x m ) 
equals (y i, ... ,y n ) if and only if m = n and xi = y i,..., x m = y m . 

Lists differ from sets in two ways: in lists, order matters and repeti¬ 
tions are allowed, whereas in sets, order and repetitions are irrelevant. 
For example, the lists (3, 5) and (5, 3) are not equal, but the sets {3, 5} 
and {5,3} are equal. The lists (4,4) and (4,4,4) are not equal (they 
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do not have the same length), though the sets {4,4} and {4,4,4} both 
equal the set {4}. 

To define the higher-dimensional analogues of R 2 and R 3 , we will 
simply replace R with F (which equals R or C) and replace the 2 or 3 
with an arbitrary positive integer. Specifically, fix a positive integer n 
for the rest of this section. We define F M to be the set of all lists of 
length n consisting of elements of F: 

F M = {{xi ,...,x n ) :xj € F for j = 1 ,...,n}. 

For example, if F = R and n equals 2 or 3, then this definition of F fl 
agrees with our previous notions of R 2 and R 3 . As another example, 
C 4 is the set of all lists of four complex numbers: 

C 4 = {(zi,z 2 ,z 3 ,z 4 ) : zi,z 2 ,z 3 ,z 4 e C}. 


If n > 4, we cannot easily visualize R n as a physical object. The same 
problem arises if we work with complex numbers: C 1 can be thought 
of as a plane, but for n > 2, the human brain cannot provide geometric 
models of C M . However, even if n is large, we can perform algebraic 
manipulations in F M as easily as in R 2 or R 3 . For example, addition is 
defined on F” by adding corresponding coordinates: 

l.i (xi,...,x n ) + (y i,---,y n ) = (xi +yi,...,x n + y n )■ 

Often the mathematics of F” becomes cleaner if we use a single 
entity to denote an list of n numbers, without explicitly writing the 
coordinates. Thus the co mm utative property of addition on F” should 
be expressed as 

x + y = y + x 

for all x,y G F”, rather than the more cumbersome 


For an amusing 
account of how R 3 
would be perceived by 
a creature living in R 2 , 
read Flatland: A 
Romance of Many 
Dimensions, by Edwin 
A. Abbott. This novel, 
published in 1884, can 
help creatures living in 
three-dimensional 
space, such as 
ourselves, imagine a 
physical space of four 
or more dimensions. 


(xi ,...,x n ) + (yi,...,y n ) = (yi,---,y n ) + (xi,...,x M ) 

for all X\,... ,x n ,yi,... ,y n e F (even though the latter formulation 
is needed to prove commutativity). If a single letter is used to denote 
an element of F", then the same letter, with appropriate subscripts, 
is often used when coordinates must be displayed. For example, if 
x e F w , then letting x equal (x 4 ,... ,x M ) is good notation. Even better, 
work with just x and avoid explicit coordinates, if possible. 
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We let 0 denote the list of length n all of whose coordinates are 0: 

0 = (0 . 0 ). 

Note that we are using the symbol 0 in two different ways—on the 
left side of the equation above, 0 denotes a list of length n, whereas 
on the right side, each 0 denotes a number. This potentially confusing 
practice actually causes no problems because the context always makes 
clear what is intended. For example, consider the statement that 0 is 
an additive identity for F n : 


x + 0 = x 


for all x e F n . Here 0 must be a list because we have not defined the 
sum of an element of F n (namely, x) and the number 0. 

A picture can often aid our intuition. We will draw pictures de¬ 
picting R 2 because we can easily sketch this space on two-dimensional 
surfaces such as paper and blackboards. A typical element of R 2 is a 
point x = ( X \, X 2 ) ■ Sometimes we think of x not as a point but as an 
arrow starting at the origin and ending at (xi,X 2 ), as in the picture 
below. When we think of x as an arrow, we refer to it as a vector. 



Elements of R 2 can be thought of as points or as vectors. 

The coordinate axes and the explicit coordinates unnecessarily clut¬ 
ter the picture above, and often you will gain better understanding by 
dispensing with them and just thinking of the vector, as in the next 
picture. 
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Whenever we use pictures in R 2 or use the somewhat vague lan¬ 
guage of points and vectors, remember that these are just aids to our 
understanding, not substitutes for the actual mathematics that we will 
develop. Though we cannot draw good pictures in high-dimensional 
spaces, the elements of these spaces are as rigorously defined as ele¬ 
ments of R 2 . For example, (2, -3,17, tt, x/ 2) is an element of R 5 , and we 
may casually refer to it as a point in R 5 or a vector in R ’ without wor¬ 
rying about whether the geometry of R 5 has any physical meaning. 

Recall that we defined the sum of two elements of F” to be the ele¬ 
ment of F" obtained by adding corresponding coordinates; see 1.1. In 
the special case of R 2 , addition has a simple geometric interpretation. 
Suppose we have two vectors x and y in R 2 that we want to add, as in 
the left side of the picture below. Move the vector y parallel to itself so 
that its initial point coincides with the end point of the vector x. The 
sum x + y then equals the vector whose initial point equals the ini¬ 
tial point of x and whose end point equals the end point of the moved 
vector y, as in the right side of the picture below. 



The sum of two vectors 


Our treatment of the vector y in the picture above illustrates a standard 
philosophy when we think of vectors in R 2 as arrows: we can move an 
arrow parallel to itself (not changing its length or direction) and still 
think of it as the same vector. 


Mathematical models 
of the economy often 
have thousands of 
variables, say 
Xi,...,X 5 ooo, which 
means that we must 
operate in R 5000 . Such 
a space cannot be dealt 
with geometrically, but 
the algebraic approach 
works well. That’s why 
our subject is called 
linear algebra. 
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Having dealt with addition in F”, we now turn to multiplication. We 
could define a multiplication on F' 1 in a similar fashion, starting with 
two elements of F' 1 and getting another element of F n by multiplying 
corresponding coordinates. Experience shows that this definition is not 
useful for our purposes. Another type of multiplication, called scalar 
multiplication, will be central to our subject. Specifically, we need to 
define what it means to multiply an element of F fl by an element of F. 
We make the obvious definition, performing the multiplication in each 
coordinate: 

a(xx n ) = (ax \,..., ax n )\ 


In scalar multiplication, 
we multiply together a 
scalar and a vector, 
getting a vector. You 
may be familiar with 
the dot product in R 2 
or R 3 , in which we 
multiply together two 
vectors and obtain a 
scalar. Generalizations 
of the dot product will 
become important 
when we study inner 
products in Chapter 6. 

You may also be 
familiar with the cross 
product in R 3 , in which 
we multiply together 
two vectors and obtain 
another vector. No 
useful generalization of 
this type of 
multiplication exists in 
higher dimensions. 


here as F and (xi ,.. .,x n ) e F M . 

Scalar multiplication has a nice geometric interpretation in R 2 . If 
a is a positive number and x is a vector in R 2 , then ax is the vector 
that points in the same direction as x and whose length is a times the 
length of x. In other words, to get ax, we shrink or stretch x by a 
factor of a, depending upon whether a < 1 or a > 1. The next picture 
illustrates this point. 



Multiplication by positive scalars 


If a is a negative number and x is a vector in R 2 , then ax is the vector 
that points in the opposite direction as x and whose length is \a\ times 
the length of x, as illustrated in the next picture. 



Multiplication by negative scalars 
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The motivation for the definition of a vector space comes from the 
important properties possessed by addition and scalar multiplication 
on F n . Specifically, addition on F" is commutative and associative and 
has an identity, namely, 0. Every element has an additive inverse. Scalar 
multiplication on F” is associative, and scalar multiplication by 1 acts 
as a multiplicative identity should. Finally, addition and scalar multi¬ 
plication on F n are connected by distributive properties. 

We will define a vector space to be a set V along with an addition 
and a scalar multiplication on V that satisfy the properties discussed 
in the previous paragraph. By an addition on V we mean a function 
that assigns an element u + v G V to each pair of elements u,v e V. 
By a scalar multiplication on V we mean a function that assigns an 
element av G V to each a G F and each v e V. 

Now we are ready to give the formal definition of a vector space. 
A vector space is a set V along with an addition on V and a scalar 
multiplication on V such that the following properties hold: 

commutativity 

u + v = v + u for all uy e V; 

associativity 

(u + v) + w = u + (v + w) and ( ab)v = a(bv) for all u, v, w e V 
and all a, b G F; 

additive identity 

there exists an element 0 e V such that v + 0 = v for all v e V 7 ; 

additive inverse 

for every v e V, there exists w e V such that v + w = 0; 

multiplicative identity 

lv = v for all v e V; 

distributive properties 

a(u + v) = au + av and (a + b)u = au + bn for all a,b G F and 
all u, v G V. 

The scalar multiplication in a vector space depends upon F. Thus 
when we need to be precise, we will say that V is a vector space over F 
instead of saying simply that V is a vector space. For example, R" is 
a vector space over R, and C n is a vector space over C. Frequently, a 
vector space over R is called a real vector space and a vector space over 
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The simplest vector 
space contains only 
one point. In other 
words, {0} is a vector 
space, though not a 
very interesting one. 


Though F" is our 
crucial example of a 
vector space, not all 
vector spaces consist 
of lists. For example, 
the elements of PIT) 
consist of functions on 
F, not lists. In general, 
a vector space is an 
abstract entity whose 
elements might be lists, 
functions, or weird 
objects. 
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C is called a complex vector space. Usually the choice of F is either 
obvious from the context or irrelevant, and thus we often assume that 
F is lurking in the background without specifically mentioning it. 

Elements of a vector space are called vectors or points. This geo¬ 
metric language sometimes aids our intuition. 

Not surprisingly, F M is a vector space over F, as you should verify. 
Of course, this example motivated our definition of vector space. 

For another example, consider F 00 , which is defined to be the set of 
all sequences of elements of F: 

F 00 = {(xi,x 2 ,...) :xj G F for j = 1,2,...}. 

Addition and scalar multiplication on F 00 are defined as expected: 

(xi,x 2 ,...) + (yi,y 2 ,...) = (xi +yi,x 2 +y 2 ,...), 
a(x i,x 2 ,...) = (axi,ax 2 ,...). 

With these definitions, F 00 becomes a vector space over F, as you should 
verify. The additive identity in this vector space is the sequence con¬ 
sisting of all 0’s. 

Our next example of a vector space involves polynomials. A function 
p: F — F is called a polynomial with coefficients in F if there exist 
a-o, ■ ■ ■, a, m G F such that 

p(z) = a o + a\z + a 2 z 2 + ■ ■ ■ + a m z m 

for all z G F. We define T( F) to be the set of all polynomials with 
coefficients in F. Addition on T( F) is defined as you would expect: if 
p, q G T( F), then p + q is the polynomial defined by 

(p + q)(z) = p(z) + q(z) 

for z G F. For example, if p is the polynomial defined by p(z) = 2z + z 3 
and q is the polynomial defined by q(z) = 7 + 4z, then p + q is the 
polynomial defined by (p + q)(z) = 7 + 6z + z 3 . Scalar multiplication 
on T( F) also has the obvious definition: if a G F and p G T( F), then 
ap is the polynomial defined by 

( ap){z ) = ap(z) 

for z G F. With these definitions of addition and scalar multiplication, 
T( F) is a vector space, as you should verify. The additive identity in 
this vector space is the polynomial all of whose coefficients equal 0. 

Soon we will see further examples of vector spaces, but first we need 
to develop some of the elementary properties of vector spaces. 
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Troperties ofyector Spaces 

The definition of a vector space requires that it have an additive 
identity. The proposition below states that this identity is unique. 

1.2 Proposition: A vector space has a unique additive identity. 

Proof: Suppose 0 and 0' are both additive identities for some vec¬ 
tor space V. Then 

0 ' = 0 ' + 0 = 0 , 

where the first equality holds because 0 is an additive identity and the 
second equality holds because 0' is an additive identity. Thus 0' = 0, 
proving that V has only one additive identity. ■ 

Each element v in a vector space has an additive inverse, an element 
w in the vector space such that v + w = 0. The next proposition shows 
that each element in a vector space has only one additive inverse. 

1.3 Proposition: Every element in a vector space has a unique 
additive inverse. 

Proof: Suppose V is a vector space. Let v e V. Suppose that w 
and w' are additive inverses of v. Then 

w = w + 0 = w + (v + w' ) = (w + v) + w' = 0 + w' = w'. 

Thus w = w', as desired. ■ 

Because additive inverses are unique, we can let -v denote the ad¬ 
ditive inverse of a vector v. We define w - v to mean w + (-v). 

Almost all the results in this book will involve some vector space. 
To avoid being distracted by having to restate frequently something 
such as “Assume that V is a vector space”, we now make the necessary 
declaration once and for all: 


Let’s agree that for the rest of the book 
V will denote a vector space over F. 


The symbol m means 
“end of the proof”. 
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Note that 1.4 and 1.5 
assert something about 
scalar multiplication 
and the additive 
identity of V. The only 
part of the definition of 
a vector space that 
connects scalar 
multiplication and 
vector addition is the 
distributive property. 
Thus the distributive 
property must be used 
in the proofs. 


Because of associativity, we can dispense with parentheses when 
dealing with additions involving more than two elements in a vector 
space. For example, we can write u+v+w without parentheses because 
the two possible interpretations of that expression, namely, (u + v) + w 
and u + (v + w), are equal. We first use this familiar convention of not 
using parentheses in the next proof. In the next proposition, 0 denotes 
a scalar (the number 0 e F) on the left side of the equation and a vector 
(the additive identity of V ) on the right side of the equation. 

1.4 Proposition: Ov = 0 for every v e V. 

Proof: For v g V, we have 

Ov = (0 + 0)v = Ov + Ov. 

Adding the additive inverse of Ov to both sides of the equation above 
gives 0 = Ov, as desired. ■ 

In the next proposition, 0 denotes the additive identity of V. Though 
their proofs are similar, 1.4 and 1.5 are not identical. More precisely, 

1.4 states that the product of the scalar 0 and any vector equals the 
vector 0, whereas 1.5 states that the product of any scalar and the 
vector 0 equals the vector 0. 

1 .5 Proposition: a0 = 0 for every a e F. 

Proof: For a g F, we have 

a0 = a(0 + 0) = a0 + a 0. 

Adding the additive inverse of aO to both sides of the equation above 
gives 0 = a0, as desired. ■ 

Now we show that if an element of V is multiplied by the scalar -1, 
then the result is the additive inverse of the element of V. 

1.6 Proposition: (-l)v = -v for every v e V. 

Proof: For v e V, we have 

v + (— 1) v = lv + (— 1) v = (1 + (— 1)) v = Ov = 0. 


This equation says that (-l)v, when added to v, gives 0. Thus (-l)v 
must be the additive inverse of v, as desired. ■ 
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Sufispaces 

A subset U of V is called a subspace of V if U is also a vector space 
(using the same addition and scalar multiplication as on V). For exam¬ 
ple, 

{(xi,x 2 ,0) : xi,x 2 G F} 

is a subspace of F 3 . 

If U is a subset of V, then to check that U is a subspace of V we 
need only check that U satisfies the following: 

additive identity 

OgU 

closed under addition 

u, v G U implies litre!/; 

closed under scalar multiplication 

a G F and u gU implies au G U. 

The first condition insures that the additive identity of V is in U. The 
second condition insures that addition makes sense on U. The third 
condition insures that scalar multiplication makes sense on U. To show 
that U is a vector space, the other parts of the definition of a vector 
space do not need to be checked because they are automatically satis¬ 
fied. For example, the associative and commutative properties of addi¬ 
tion automatically hold on U because they hold on the larger space V. 
As another example, if the third condition above holds and u G U, then 
-u (which equals (-l)u by 1.6) is also in U, and hence every element 
of U has an additive inverse in U. 

The three conditions above usually enable us to determine quickly 
whether a given subset of V is a subspace of V. For example, if b G F, 
then 

{(Xi,X2,X3,X4) G F 4 : X3 = 5 X 4 + b} 

is a subspace of F 4 if and only if b = 0, as you should verify. As another 
example, you should verify that 

{p gTCE) -.p(3) = 0} 


Some mathematicians 
use the term linear 
subspace, which means 
the same as subspace. 


Clearly {0} is the 
smallest subspace of V 
and V itself is the 
largest subspace of V. 
The empty set is not a 
subspace of V because 
a subspace must be a 
vector space and a 
vector space must 
contain at least one 
element, namely, an 
additive identity. 


is a subspace of T( F). 

The subspaces of R 2 are precisely {0}, R 2 , and all lines inR 2 through 
the origin. The subspaces of R 3 are precisely {0}, R 3 , all lines in R 3 
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When dealing with 
vector spaces, we are 
usually interested only 
in subspaces, as 
opposed to arbitrary 
subsets. The union of 
subspaces is rarely a 
subspace (see 
Exercise 9 in this 
chapter), which is why 
we usually work with 
sums rather than 
unions. 


Sums of subspaces in 
the theory of vector 
spaces are analogous to 
unions of subsets in set 
theory. Given two 
subspaces of a vector 
space, the smallest 
subspace containing 
them is their sum. 
Analogously, given two 
subsets of a set, the 
smallest subset 
containing them is 
their union. 


through the origin, and all planes in R 3 through the origin. To prove 
that all these objects are indeed subspaces is easy—the hard part is to 
show that they are the only subspaces of R 2 or R 3 . That task will be 
easier after we introduce some additional tools in the next chapter. 

Sums and Virect Sums 

In later chapters, we will find that the notions of vector space sums 
and direct sums are useful. We define these concepts here. 

Suppose Ui,...,U m are subspaces of V. The sum of U\,...,U m , 
denoted Ui + ■ ■ ■ + U m , is defined to be the set of all possible sums of 
elements of Ui ,..., U m . More precisely, 

Ui + ■ ■ ■ + U m = {ui + ■ ■ ■ + u m \ll\ G Ui,..., u m G U m }. 

You should verify that if Ui ,..., U m are subspaces of V, then the sum 
Ui + ■ ■ ■ + U m is a subspace of V. 

Let’s look at some examples of sums of subspaces. Suppose U is the 
set of all elements of F 3 whose second and third coordinates equal 0, 
and W is the set of all elements of F 3 whose first and third coordinates 
equal 0: 

L = l(x,0,0)£F 3 :xeF| and W = {(0, y, 0) e F 3 : y e F}. 
Then 

1.7 U + W = {(x,y,0) : x,y e F}, 

as you should verify. 

As another example, suppose U is as above and W is the set of all 
elements of F 3 whose first and second coordinates equal each other 
and whose third coordinate equals 0: 

Ik = {(y,y, 0) g F 3 : y g F}. 

Then U + W is also given by 1.7, as you should verify. 

Suppose Ui,..., U m are subspaces of V. Clearly U\,...,U m are all 
contained in Ui + ■ ■ ■ + U m (to see this, consider sums U\ + ■ ■ ■ + u m 
where all except one of the u’s are 0). Conversely, any subspace of V 
containing U\,...,U m must contain U\ + ■ ■ ■ + U m (because subspaces 
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must contain all finite sums of their elements). Thus U\ + ■ ■ ■ + U m is 
the smallest subspace of V containing U\,..., U m . 

Suppose Ui,...,U m are subspaces of V such that V = U\ + ■ ■ ■ + U m . 
Thus every element of V can be written in the form 


u ! + ■■■+ u m , 


where each Uj e Uj. We will be especially interested in cases where 
each vector in V can be uniquely represented in the form above. This 
situation is so important that we give it a special name: direct sum. 
Specifically, we say that V is the direct sum of subspaces Ui,, U m , 
written V = Ui © ■ ■ ■ ® U m , if each element of V can be written uniquely 
as a sum Ui + ■ ■ ■ + u m , where each Uj G Uj. 

Let’s look at some examples of direct sums. Suppose U is the sub¬ 
space of F 3 consisting of those vectors whose last coordinate equals 0, 
and W is the subspace of F 3 consisting of those vectors whose first two 
coordinates equal 0: 

U = {(x,y,0) e F 3 :x,y e F} and W = {(0,0,z)eF 3 :zeF). 

Then F 3 = U © W, as you should verify. 

As another example, suppose Uj is the subspace of F' 1 consisting 
of those vectors whose coordinates are all 0, except possibly in the j th 
slot (for example, U 2 = {(0, x, 0,..., 0) e F" : x e F}). Then 


The symbol ©, 
consisting of a plus 
sign inside a circle, is 
used to denote direct 
sums as a reminder 
that we are dealing with 
a special type of sum of 
subspaces—each 
element in the direct 
sum can be represented 
only one way as a sum 
of elements from the 
specified subspaces. 


F n = Ui e ■ ■ ■ e U n , 


as you should verify. 

As a final example, consider the vector space T( F) of all polynomials 
with coefficients in F. Let U e denote the subspace of T( F) consisting 
of all polynomials p of the form 


p(z) = ao + a, 2 Z 2 + ■ ■ ■ + a 2 mZ 2m , 

and let U 0 denote the subspace of T(¥) consisting of all polynomials p 
of the form 


p(z) = a\Z + a 3 z 3 + ■ ■ ■ + a 2m+ iz 2m+1 ; 

here m is a nonnegative integer and ao,..., a 2m +1 e F (the notations 
U e and U 0 should remind you of even and odd powers of z). You should 
verify that 
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T( F) = U e ® U 0 . 

Sometimes nonexamples add to our understanding as much as ex¬ 
amples. Consider the following three subspaces of F 3 : 

Ui = {(x,y, 0) G F 3 :xjeFJ; 

U 2 = l(0,0,z)£F 3 :z£Fi; 

U-i = {(0,y,y) £F 3 :y£F{. 

Clearly F 3 = Ui + U 2 + U 3 because an arbitrary vector (x, y, z) G F 3 can 
be written as 


(x, y,z) = {x,y, 0) + (0,0, z) + (0,0,0), 

where the hrst vector on the right side is in IJ\ , the second vector is 
in U 2 , and the third vector is in U 3 . However, F 3 does not equal the 
direct sum of U\,U 2 , U 3 because the vector (0, 0, 0) can be written in 
two different ways as a sum U1+U.2+U3, with each Uj G Uj. Specifically, 
we have 

( 0 , 0 , 0 ) = ( 0 , 1 , 0 ) + ( 0 , 0 , 1 ) + ( 0 ,- 1 ,- 1 ) 

and, of course, 

(0,0,0) = (0,0,0) + (0,0,0) + (0,0,0), 

where the hrst vector on the right side of each equation above is in U 1, 
the second vector is in U 2 , and the third vector is in U 3 . 

In the example above, we showed that something is not a direct sum 
by showing that 0 does not have a unique representation as a sum of 
appropriate vectors. The definition of direct sum requires that every 
vector in the space have a unique representation as an appropriate sum. 
Suppose we have a collection of subspaces whose sum equals the whole 
space. The next proposition shows that when deciding whether this 
collection of subspaces is a direct sum, we need only consider whether 
0 can be uniquely written as an appropriate sum. 

1.8 Proposition: Suppose that Ui,...,U n are subspaces of V. Then 
V = Ui © ■ ■ ■ ® U n if and only if both the following conditions hold: 

(a) V = C7i + ■ ■ ■ + U n ; 

(b) the only way to write 0 as a sum U\ + ■ ■ ■ + u n , where each 
Uj G Uj, is by taking all the uj’s equal to 0. 
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Proof: First suppose that V = U\ © ■ ■ ■ ® U n . Clearly (a) holds 
(because of how sum and direct sum are defined). To prove (b), suppose 
that u\ e Ui ,..., u n e U n and 

0 = U\ + ■ ■ ■ + u n - 

Then each uj must be 0 (this follows from the uniqueness part of the 
definition of direct sum because 0 = 0 + ■ ■ ■ + 0 and 0 e U\ ,..., 0 e U n ), 
proving (b). 

Now suppose that (a) and (b) hold. Let v G V. By (a), we can write 
v = U\ + ■ ■ ■ + u n 

for some ui G U\,...,u n G U n . To show that this representation is 
unique, suppose that we also have 

v = Vi + ■ ■ ■ + v n , 

where v\ g Ui,..., v n g U n - Subtracting these two equations, we have 


0 = (Ml - Vi) + ■ ■ ■ + (u n - V n ). 

Clearly u\ - Vi G U \,..., u n - v n G [/„, so the equation above and (b) 
imply that each uj - Vj = 0. Thus u\ = Vi,..., u n = v n , as desired. ■ 

The next proposition gives a simple condition for testing which pairs 
of subspaces give a direct sum. Note that this proposition deals only 
with the case of two subspaces. When asking about a possible direct 
sum with more than two subspaces, it is not enough to test that any 
two of the subspaces intersect only at 0. To see this, consider the 
nonexample presented just before 1.8. In that nonexample, we had 
F 3 = Ui + U 2 + U 3 , but F 3 did not equal the direct sum of L/i, f/ 2 , t/ 3 . 
However, in that nonexample, we have Ui n f/ 2 = l/i n U 3 = fi 2 n Ui = {0} 
(as you should verify). The next proposition shows that with just two 
subspaces we get a nice necessary and sufficient condition for a direct 
sum. 

1.9 Proposition: Suppose that U and W are subspaces of V. Then 
V = U ®W if and only if V = U + W and U nW = {0}. 

Proof: First suppose that V = U ® W. Then V = U + W (by the 
definition of direct sum). Also, if v G U n W, then 0 = v + (-v), where 


Sums of subspaces are 
analogous to unions of 
subsets. Similarly, 
direct sums of 
subspaces are 
analogous to disjoint 
unions of subsets. No 
two subspaces of a 
vector space can be 
disjoint because both 
must contain 0. So 
disjointness is 
replaced, at least in the 
case of two subspaces, 
with the requirement 
that the intersection 
equals { 0 }. 
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v e U and -v e W. By the unique representation of 0 as the sum of a 
vector in U and a vector in W, we must have v = 0. Thus U n W = {0}, 
completing the proof in one direction. 

To prove the other direction, now suppose that V = U + W and 
U n W = {0}. To prove that V = U © W, suppose that 

0 = u + w, 

where u e U and w e W. To complete the proof, we need only show 
that u = w = 0 (by 1.8). The equation above implies that it = -w e W. 
Thus u G U n W, and hence u = 0. This, along with equation above, 
implies that w = 0, completing the proof. ■ 
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"Exercises 


i. 


2. 


3. 

4. 

5. 


6 . 


7. 

8 . 

9. 

10 

11 


Suppose a and b are real numbers, not both 0. Find real numbers 
c and d such that 

1 l(a + hi) = c + di. 


Show that 


-1 + V3 i 
2 


is a cube root of 1 (meaning that its cube equals 1 ). 
Prove that -(-v) = v for every v G V. 


Prove that if a G F, v e V, and av = 0, then a = 0 or v = 0. 

For each of the following subsets of F 3 , determine whether it is 
a subspace of F 3 : 

(a) {(xi ,x 2 ,x 3 ) G F 3 : xi + 2x 2 + 3x 3 = 0}; 

(b) {(xuX 2 ,x 3 ) G F 3 : xi + 2x 2 + 3x 3 = 4}; 

(c) {(xi,x 2 ,x 3 ) G F 3 : xix 2 x 3 = 0}; 

(d) {(xi,x 2 ,x 3 ) G F 3 : xi = 5x 3 }. 


Give an example of a nonempty subset U of R 2 such that U is 
closed under addition and under taking additive inverses (mean¬ 
ing -u G U whenever u G U), but U is not a subspace of R 2 . 

Give an example of a nonempty subset U of R 2 such that U is 
closed under scalar multiplication, but U is not a subspace of R 2 . 

Prove that the intersection of any collection of subspaces of V is 
a subspace of V. 

Prove that the union of two subspaces of V is a subspace of V if 
and only if one of the subspaces is contained in the other. 

Suppose that U is a subspace of V. What is U + U? 

Is the operation of addition on the subspaces of V commutative? 
Associative? (In other words, if Ui,U 2 , U 3 are subspaces of V, is 
U l + U 2 = U 2 + t/i? Is (Ui + U 2 ) + U 3 = Ui + ( U 2 + U 3 )7) 
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12. Does the operation of addition on the subspaces of V have an 
additive identity? Which subspaces have additive inverses? 

13. Prove or give a counterexample: if U\, U 2 , W are subspaces of V 
such that 

Ui + w = u 2 + w, 

then U\ = U '2 ■ 

14. Suppose U is the subspace of T(¥) consisting of all polynomials 
p of the form 

p(z) = az 2 + bz s , 

where a,b e F. Find a subspace W of ?( F) such that T( F) = 
U® W. 

15. Prove or give a counterexample: if Ui, U 2 , W are subspaces of V 
such that 

V = Ui © W and V = U 2 © W, 

then U\ = U '2 ■ 



Chapter 2 


Jinite-VimemionaC 
'Vector Spaces 


In the last chapter we learned about vector spaces. Linear algebra 
focuses not on arbitrary vector spaces, but on finite-dimensional vector 
spaces, which we introduce in this chapter. Here we will deal with the 
key concepts associated with these spaces: span, linear independence, 
basis, and dimension. 

Let’s review our standing assumptions: 

Recall that F denotes R or C. 

Recall also that V is a vector space over F. 



21 




22 


Chapter 2. Finite-Dimensional Vector Spaces 


Some mathematicians 
use the term linear 
span, which means the 
same as span. 


Recall that by 
definition every list has 
finite length. 


Span and Linear Independence 

A linear combination of a list (vi,..., v m ) of vectors in V is a vector 
of the form 

2.1 a. i'Vi + ■ ■ ■ + a.fn’Vfn, 

where a\,..., a m e F. The set of all linear combinations of (vi,..., v m ) 
is called the span of (vi,..., v m ), denoted spanivi,..., v m ). In other 
words, 

span(vi,...,v m ) = {aiVi + ■ ■ ■ + a m v m : g F}. 

As an example of these concepts, suppose V = F 3 . The vector 
(7, 2, 9) is a linear combination of ((2,1, 3), (1,0,1)) because 

(7,2,9) = 2(2,1,31 + 3(1,0,1). 

Thus (7,2,9) e span((2,1, 3), (1,0,1)). 

You should verify that the span of any list of vectors in V is a sub¬ 
space of V. To be consistent, we declare that the span of the empty list 
() equals {0} (recall that the empty set is not a subspace of V). 

If (vi,..., Vm) is a list of vectors in V , then each Vj is a linear com¬ 
bination of (vi,..., v m ) (to show this, set aj = 1 and let the other a’s 
in 2.1 equal 0). Thus span(vi,..., v m ) contains each Vj. Conversely, 
because subspaces are closed under scalar multiplication and addition, 
every subspace of V containing each Vj must contain spanfvi,..., v m ). 
Thus the span of a list of vectors in V is the smallest subspace of V 
containing all the vectors in the list. 

If span(vi,..., v m ) equals V, we say that (vi,..., v m ) spans V. A 
vector space is called finite dimensional if some list of vectors in it 
spans the space. For example, F" is finite dimensional because 

( d ,0 . 0 ), ( 0 , 1,0 . 0 ). (0 . 0 , 1 )) 

spans F", as you should verify. 

Before giving the next example of a finite-dimensional vector space, 
we need to define the degree of a polynomial. A polynomial p e T( F) 
is said to have degree m if there exist scalars ao, a m e F with 

a m 0 such that 


2.2 


p(z) = ao + a\z + ■ ■ ■ + a m z m 
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for all z e F. The polynomial that is identically 0 is said to have de¬ 
gree -oo. 

For m a nonnegative integer, let T m (F) denote the set of all poly¬ 
nomials with coefficients in F and degree at most m. You should ver¬ 
ify that T m { F) is a subspace of T( F); hence T m ( F) is a vector space. 
This vector space is finite dimensional because it is spanned by the list 
(1, z,..., z m ); here we are slightly abusing notation by letting z k denote 
a function (so z is a dummy variable). 

A vector space that is not finite dimensional is called infinite di¬ 
mensional. For example, T( F) is infinite dimensional. To prove this, 
consider any list of elements of T(F). Let m denote the highest degree 
of any of the polynomials in the list under consideration (recall that by 
definition a list has finite length). Then every polynomial in the span of 
this list must have degree at most m. Thus our list cannot span T( F). 
Because no list spans T( F), this vector space is infinite dimensional. 

The vector space F 00 , consisting of all sequences of elements of F, 
is also infinite dimensional, though this is a bit harder to prove. You 
should be able to give a proof by using some of the tools we will soon 
develop. 

Suppose Vi,..., v m G V and v e span(vi,..., v m ). By the definition 
of span, there exist ai ,..., a m e F such that 

v = a\V\ + ■ ■ ■ + a m v m . 

Consider the question of whether the choice of a’s in the equation 
above is unique. Suppose d \,..., d m is another set of scalars such that 

v = diVi + ■ ■ ■ + a m v m . 

Subtracting the last two equations, we have 

0 — (d\ — di)V{ + ■ ■ ■ + (dm — dm ) 'Vm ■ 

Thus we have written 0 as a linear combination of (vi,..., v m ). If the 
only way to do this is the obvious way (using 0 for all scalars), then 
each dj - dj equals 0, which means that each aj equals dj (and thus 
the choice of a’s was indeed unique). This situation is so important 
that we give it a special name—linear independence—which we now 
define. 

A list (vi,..., v m ) of vectors in V is called linearly independent if 
the only choice of d \,..., a m e F that makes aiVi + ■ ■ ■ + a m v m equal 
0 is d\ = ■ ■ ■ = a m = 0. For example, 


Infini te-dimensional 
vector spaces, which 
we will not mention 
much anymore, are the 
center of attention in 
the branch of 
mathematics called 
functional analysis. 
Functional analysis 
uses tools from both 
analysis and algebra. 
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Most linear algebra 
texts define linearly 
independent sets 
instead of linearly 
independent lists. With 
that definition, the set 
{(0,1), (0,1), d,0)} is 
linearly independent in 
F 2 because it equals the 
set {(0,1), (1,0)}. With 
our definition, the list 
((0,1), (0,1), (1,0)) is 
not linearly 
independent (because 1 
times the first vector 
plus -1 times the 
second vector plus 0 
times the third vector 
equals 0). By dealing 
with lists instead of 
sets, we will avoid 
some problems 
associated with the 
usual approach. 
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((1,0,0,0), (0,1,0,0), (0,0,1,0)) 

is linearly independent in F 4 , as you should verify. The reasoning in the 
previous paragraph shows that (Vi,..., v m ) is linearly independent if 
and only if each vector in span(vi,..., v m ) has only one representation 
as a linear combination of (vi,..., v m ). 

For another example of a linearly independent list, fix a nonnegative 
integer m. Then (1, z,..., z m ) is linearly independent in (P(F). To verify 
this, suppose that do, a m e F are such that 

2.3 ao + ct\z + ■ ■ ■ + a m z m = 0 

for every z e F. If at least one of the coefficients ao, a \,..., a m were 
nonzero, then 2.3 could be satisfied by at most m distinct values of z (if 
you are unfamiliar with this fact, just believe it for now; we will prove 
it in Chapter 4); this contradiction shows that all the coefficients in 2.3 
equal 0. Hence (1, z,..., z m ) is linearly independent, as claimed. 

A list of vectors in V is called linearly dependent if it is not lin¬ 
early independent. In other words, a list (vi,..., v m ) of vectors in V 
is linearly dependent if there exist a\,..., a m e F, not all 0, such that 
aiV\ + ■ ■ ■ + a m v m = 0. For example, ((2, 3,1), (1, -1, 2), (7, 3, 8)) is 
linearly dependent in F 3 because 

2(2, 3,1) + 3(1, -1, 2) + ( —1)(7, 3,8) = (0,0,0). 

As another example, any list of vectors containing the 0 vector is lin¬ 
early dependent (why?). 

You should verify that a list (v) of length 1 is linearly independent if 
and only if v 0. You should also verify that a list of length 2 is linearly 
independent if and only if neither vector is a scalar multiple of the 
other. Caution: a list of length three or more may be linearly dependent 
even though no vector in the list is a scalar multiple of any other vector 
in the list, as shown by the example in the previous paragraph. 

If some vectors are removed from a linearly independent list, the 
remaining list is also linearly independent, as you should verify. To 
allow this to remain true even if we remove all the vectors, we declare 
the empty list () to be linearly independent. 

The lemma below will often be useful. It states that given a linearly 
dependent list of vectors, with the first vector not zero, one of the 
vectors is in the span of the previous ones and furthermore we can 
throw out that vector without changing the span of the original list. 
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2.4 Linear Dependence Lemma: If (vi,... ,v m ) is linearly depen¬ 
dent in V and V| f 0, then there exists j e {2,..., m j such that the 
following hold: 

(a) vj e span(vi,...,Vj_i); 

(b) if the j lh term is removed from (vi,... ,v m ), the span of the 
remaining list equals span(vj,..., v m ). 


Proof: Suppose (vi, ..., v m ) is linearly dependent in V and Vi f 0. 
Then there exist a\, ..., a m g F, not all 0, such that 


a\V i + ■ ■ ■ + CLmVm — 0 . 


Not all of a 2 ,a-i ,..., a m can be 0 (because Vi f 0). Let j be the largest 
element of {2,..., m} such that aj f 0. Then 


2.5 


d\ 

Vj = -Vi 

a, 


a j-t 

& j 


Vj- 1, 


proving (a). 

To prove (b), suppose that n g span(vi,..., v TO ). Then there exist 
Ci, ..., c m G F such that 


u = cwi + ■ ■ ■ + c m v m . 


In the equation above, we can replace Vj with the right side of 2.5, 
which shows that u is in the span of the list obtained by removing the 
jth t erm f r om (vi,..., Vm)- Thus (b) holds. ■ 

Now we come to a key result. It says that linearly independent lists 
are never longer than spanning lists. 


2.6 Theorem: In a hnite-dimensional vector space, the length of 
every linearly independent list of vectors is less than or equal to the 
length of every spanning list of vectors. 

Proof: Suppose that (ui, ..., u m ) is linearly independent in V and 
that (wi,... , w n ) spans V. We need to prove that m < n. We do so 
through the multistep process described below; note that in each step 
we add one of the u’s and remove one of the w’s. 


Suppose that for each 
positive integer m, 
there exists a linearly 
independent list of m 
vectors in V. Then this 
theorem implies that V 
is infinite dimensional. 
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Step 1 

The list (wi,..., w n ) spans V, and thus adjoining any vector to it 
produces a linearly dependent list. In particular, the list 

(Ui,W 1 ,...,W n ) 

is linearly dependent. Thus by the linear dependence le mm a (2.4), 
we can remove one of the w's so that the list B (of length n) 
consisting of u\ and the remaining w’s spans V. 

Step j 

The list B (of length n) from step j-l spans V, and thus adjoining 
any vector to it produces a linearly dependent list. In particular, 
the list of length (n + 1) obtained by adjoining Uj to B, placing it 
just after ui,..., Uj- 1 , is linearly dependent. By the linear depen¬ 
dence lemma (2.4), one of the vectors in this list is in the span of 
the previous ones, and because (ui,..., Uj) is linearly indepen¬ 
dent, this vector must be one of the w’s, not one of the u’s. We 
can remove that w from B so that the new list B (of length n ) 
consisting of U\,. .., Uj and the remaining w’s spans V. 

After step m, we have added all the u’s and the process stops. If at 
any step we added a u and had no more w’s to remove, then we would 
have a contradiction. Thus there must be at least as many w’s as u’s. ■ 

Our intuition tells us that any vector space contained in a finite¬ 
dimensional vector space should also be finite dimensional. We now 
prove that this intuition is correct. 

2.7 Proposition: Every subspace of a finite-dimensional vector 
space is finite dimensional. 

Proof: Suppose V is finite dimensional and U is a subspace of V. 
We need to prove that U is finite dimensional. We do this through the 
following multistep construction. 

Step 1 

If U = {0}, then U is finite dimensional and we are done. If U f 
{0}, then choose a nonzero vector Vi e U. 


Step j 

If U = span(vi,..., Vj-i), then U is finite dimensional and we are 
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done. If U f span(vi, Vj-i), then choose a vector Vj e U such 
that 

vj £ span(vi,...,Vj_i). 

After each step, as long as the process continues, we have constructed 
a list of vectors such that no vector in this list is in the span of the 
previous vectors. Thus after each step we have constructed a linearly 
independent list, by the linear dependence lemma (2.4). This linearly 
independent list cannot be longer than any spanning list of V (by 2.6), 
and thus the process must eventually terminate, which means that U 
is finite dimensional. ■ 

'Bases 

A basis of V is a list of vectors in V that is linearly independent and 
spans V. For example, 

( d ,0 . 0 ), ( 0 , 1,0 . 0 ). (0 . 0 , 1 )) 

is a basis of F n , called the standard basis of F". In addition to the 
standard basis, F' 1 has many other bases. For example, ((1, 2), (3, 5)) 
is a basis of F 2 . The list ((1, 2)) is linearly independent but is not a 
basis of F 2 because it does not span F 2 . The list ((1, 2), (3, 5), (4, 7)) 
spans F 2 but is not a basis because it is not linearly independent. As 
another example, (1, z,..., z m ) is a basis of f’ m (F). 

The next proposition helps explain why bases are useful. 

2.8 Proposition: A list (vi,..., v n ) of vectors in V is a basis of V 
if and only if every v G V can be written uniquely in the form 

2.9 v = a\V\ + ■ ■ ■ + a n v n , 


where a\,...,a n G F. 


Proof: First suppose that (vi,..., v n ) is a basis of V. Let v g V. 
Because (vi,..., v n ) spans V, there exist ai,...,a n e F such that 2.9 
holds. To show that the representation in 2.9 is unique, suppose that 
b\,...,b n are scalars so that we also have 


This proof is 
essentially a repetition 
of the ideas that led us 
to the definition of 
linear independence. 


v = b\V\ + ■ ■ ■ + b n v n . 
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Subtracting the last equation from 2.9, we get 


0 = {d\ — b i)vi + ■ ■ ■ + (a n - b n )v n . 

This implies that each aj - bj = 0 (because (V|,..., v n ) is linearly inde¬ 
pendent) and hence ai = b\,... ,a n = b n . We have the desired unique¬ 
ness, completing the proof in one direction. 

For the other direction, suppose that every v e V can be written 
uniquely in the form given by 2.9. Clearly this implies that (vi,..., v n ) 
spans V. To show that (vi,...,v n ) is linearly independent, suppose 
that ai,...,a n eF are such that 

0 = aiVi + ■ ■ ■ + a n v n . 

The uniqueness of the representation 2.9 (with v = 0) implies that 
ai = ■ ■ ■ = a n = 0. Thus (vi,...,v M ) is linearly independent and 
hence is a basis of V. m 

A spanning list in a vector space may not be a basis because it is not 
linearly independent. Our next result says that given any spanning list, 
some of the vectors in it can be discarded so that the remaining list is 
linearly independent and still spans the vector space. 

2.10 Theorem: Every spanning list in a vector space can be reduced 
to a basis of the vector space. 

Proof: Suppose (vi,...,v M ) spans V. We want to remove some 
of the vectors from (vi,..., v n ) so that the remaining vectors form a 
basis of V. We do this through the multistep process described below. 
Start with B = (vi,..., v n ). 


Step 1 

If vi = 0, delete vi from B. If vi f 0, leave B unchanged. 

Step j 

If vj is in span(vi,..., Vj-i), delete Vj from B. If Vj is not in 
span(vi,..., Vj- 1 ), leave B unchanged. 

Stop the process after step n, getting a list B. This list B spans V 
because our original list spanned B and we have discarded only vectors 
that were already in the span of the previous vectors. The process 
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insures that no vector in B is in the span of the previous ones. Thus B 
is linearly independent, by the linear dependence lemma (2.4). Hence 
B is a basis of V. m 

Consider the list 


((1,2),(3,6),(4,7),(5,9)), 

which spans F 2 . To make sure that you understand the last proof, you 
should verify that the process in the proof produces ((1, 2), (4, 7)), a 
basis of F 2 , when applied to the list above. 

Our next result, an easy corollary of the last theorem, tells us that 
every finite-dimensional vector space has a basis. 

2.11 Corollary: Every finite-dimensional vector space has a basis. 

Proof: By definition, a finite-dimensional vector space has a span¬ 
ning list. The previous theorem tells us that any spanning list can be 
reduced to a basis. ■ 


We have crafted our definitions so that the finite-dimensional vector 
space {0} is not a counterexample to the corollary above. In particular, 
the empty list () is a basis of the vector space {0} because this list has 
been defined to be linearly independent and to have span {0}. 

Our next theorem is in some sense a dual of 2.10, which said that 
every spanning list can be reduced to a basis. Now we show that given 
any linearly independent list, we can adjoin some additional vectors so 
that the extended list is still linearly independent but also spans the 
space. 


2.12 Theorem: Every linearly independent list of vectors in a finite¬ 
dimensional vector space can be extended to a basis of the vector space. 

Proof: Suppose V is finite dimensional and (vi,..., v m ) is linearly 
independent in V. We want to extend (vi,..., v m ) to a basis of V. We 
do this through the multistep process described below. First we let 
(wi,..., w n ) be any list of vectors in V that spans V. 

Step 1 

If W\ is in the span of (vi,..., v m ), let B = (vi,..., v m ). If W\ is 
not in the span of (vi,..., v m ), let B = (vi,..., 


This theorem can be 
used to give another 
proof of the previous 
corollary. Specifically, 
suppose V is finite 
dimensional. This 
theorem implies that 
the empty list () can be 
extended to a basis 
of V. In particular, V 
has a basis. 
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Using the same basic 
ideas but considerably 
more advanced tools, 
this proposition can be 
proved without the 
hypothesis that V is 
finite dimensional. 


Step j 

If Wj is in the span of B, leave B unchanged. If wj is not in the 
span of B, extend B by adjoining wj to it. 

After each step, B is still linearly independent because otherwise the 
linear dependence lemma (2.4) would give a contradiction (recall that 
(Vi,..., v m ) is linearly independent and any w ; that is adjoined to B is 
not in the span of the previous vectors in B). After step n, the span of 
B includes all the w’s. Thus the B obtained after step n spans V and 
hence is a basis of V. m 

As a nice application of the theorem above, we now show that ev¬ 
ery subspace of a finite-dimensional vector space can be paired with 
another subspace to form a direct sum of the whole space. 

2.1 3 Proposition: Suppose V is finite dimensional and U is a sub¬ 
space of V. Then there is a subspace W of V such that V = U ® W. 

Proof: Because V is finite dimensional, so is U (see 2.7). Thus 
there is a basis (ui,...,u m ) of U (see 2.11). Of course (iti,... ,u m ) 
is a linearly independent list of vectors in V, and thus it can be ex¬ 
tended to a basis (ui ,..., u m , ivi,..., w n ) of V (see 2.12). Let IT = 
span(tvi,... ,tv n ). 

To prove that V = U © IT, we need to show that 
V = U + W and UnW = {0}\ 

see 1.9. To prove the first equation, suppose that v e V. Then, 
because the list (ui ,..., u m , Wi ,..., w n ) spans V, there exist scalars 
ai,...,a m ,b\,...,b n e F such that 

v = a\U\ + ■ ■ ■ + a m u m + bywi + ■ ■ ■ + b n w n . 

U W 

In other words, we have v = u + w, where u e U and w e W are defined 
as above. Thus v e U + W, completing the proof that V = U + W. 

To show that U n W = {0}, suppose v G U n W. Then there exist 
scalars a \,..., a m , by ,..., b n e F such that 

v = a\U\ + ■ ■ ■ + a m u m = bywi + ■ ■ ■ + b n w n . 


Thus 
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-f" ' ' ' dypi1A.-yyi h \~W \ ' ' ' byi~Wyi — 0. 

Because (u i,..., u m , Wi,..., w n ) is linearly independent, this implies 
that «! = ■■■= a m = b\ = ■ ■ ■ = b n = 0. Thus v = 0, completing the 
proof that U n W = {0}. ■ 

Dimension 

Though we have been discussing finite-dimensional vector spaces, 
we have not yet defined the dimension of such an object. How should 
dimension be defined? A reasonable definition should force the dimen¬ 
sion of F' 1 to equal n. Notice that the basis 

( d ,0 . 0 ), ( 0 , 1,0 . 0 ).( 0 ,.... 0 , 1 )) 

has length n. Thus we are tempted to define the dimension as the 
length of a basis. However, a finite-dimensional vector space in general 
has many different bases, and our attempted definition makes sense 
only if all bases in a given vector space have the same length. Fortu¬ 
nately that turns out to be the case, as we now show. 

2.14 Theorem: Any two bases of a finite-dimensional vector space 
have the same length. 

Proof: Suppose V is finite dimensional. Let If and B > be any two 
bases of V. Then B\ is linearly independent in V and spans V, so the 
length of Bi is at most the length of B > (by 2.6). Interchanging the roles 
of B i and fb, we also see that the length of Bz is at most the length 
of B]. Thus the length of Bi must equal the length of Bz, as desired. ■ 

Now that we know that any two bases of a finite-dimensional vector 
space have the same length, we can formally define the dimension of 
such spaces. The dimension of a finite-dimensional vector space is 
defined to be the length of any basis of the vector space. The dimension 
of V (if V is finite dimensional) is denoted by dim V. As examples, note 
that dim F" = n and dim (F) = m + I. 

Every subspace of a finite-dimensional vector space is finite dimen¬ 
sional (by 2.7) and so has a dimension. The next result gives the ex¬ 
pected inequality about the dimension of a subspace. 
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The real vector space 
R 2 has dimension 2; 
the complex vector 
space C has 
dimension 1. As sets, 
R 2 can be identified 
with C (and addition is 
the same on both 
spaces, as is scalar 
multiplication by real 
numbers). Thus when 
we talk about the 
dimension of a vector 
space, the role played 
by the choice of F 
cannot be neglected. 


2.1 5 Proposition: If V is finite dimensional and U is a subspace 
of V, then dim U < dim V. 

Proof: Suppose that V is finite dimensional and U is a subspace 
of V. Any basis of U is a linearly independent list of vectors in V and 
thus can be extended to a basis of V (by 2.12). Hence the length of a 
basis of U is less than or equal to the length of a basis of V. m 

To check that a list of vectors in V is a basis of V, we must, according 
to the definition, show that the list in question satisfies two properties: 
it must be linearly independent and it must span V. The next two 
results show that if the list in question has the right length, then we 
need only check that it satisfies one of the required two properties. 
We begin by proving that every spanning list with the right length is a 
basis. 

2.16 Proposition: If V is finite dimensional, then every spanning 
list of vectors in V with length dim V is a basis of V. 

Proof: Suppose dim V = n and (vi,...,v n ) spans V. The list 
(Vi, ... ,v n ) can be reduced to a basis of V (by 2.10). However, every 
basis of V has length n , so in this case the reduction must be the trivial 
one, meaning that no elements are deleted from (vi,..., v n ). In other 
words, (vi,..., v n ) is a basis of V, as desired. ■ 

Now we prove that linear independence alone is enough to ensure 
that a list with the right length is a basis. 

2.17 Proposition: If V is finite dimensional, then every linearly 
independent list of vectors in V with length dim V is a basis of V. 

Proof: Suppose dim V = n and (vi,..., v M ) is linearly independent 
in V. The list (vi,..., v n ) canbe extended to a basis of V (by 2.12). How¬ 
ever, every basis of V has length n , so in this case the extension must be 
the trivial one, meaning that no elements are adjoined to (vi,..., v „). 
In other words, (vi,..., v n ) is a basis of V, as desired. ■ 

As an example of how the last proposition can be applied, consider 
the list ((5, 7), (4, 3)). This list of two vectors in F 2 is obviously linearly 
independent (because neither vector is a scalar multiple of the other). 
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Because F 2 has dimension 2, the last proposition implies that this lin¬ 
early independent list of length 2 is a basis of F 2 (we do not need to 
bother checking that it spans F 2 ). 

The next theorem gives a formula for the dimension of the sum of 
two subspaces of a finite-dimensional vector space. 


2.18 Theorem: If Ui and U 2 are subspaces of a finite-dimensional 
vector space, then 

dimffJi + 112 ) = dim lb + dim U 2 - dim((7| n U 2 ) ■ 

Proof: Let (u 1 ,..., u m ) be a basis of (7i n U 2 ', thus dim((7i n U 2 ) = 
m. Because (iti,..., u m ) is abasis of Ui n U 2 , it is linearly independent 
in f/i and hence can be extended to a basis (u 1 ,..., u m , Vi,..., v ,■) of U\ 
(by 2.12). Thus dim lb = m + j. Also extend (u 1 ,..., u m ) to a basis 
(ui,... ... ,Wk) of U 2 , thus dim (72 = m + k. 

We will show that (u 1 ,..., u m ,v 1 ,..., Vj, w 1 ,..., Wk) is a basis of 
lh + U 2 - This will complete the proof because then we will have 

dim((7i + U 2 ) = m + j + k 

= (m + j) + (m + k) - m 
= dim lb + dim (72 - dim((7i n ( 72 ). 


This formula for the 
dimension of the sum 
of two subspaces is 
analogous to a familiar 
counting formula: the 
number of elements in 
the union of two finite 
sets equals the number 
of elements in the first 
set, plus the number of 
elements in the second 
set, minus the number 
of elements in the 
intersection of the two 
sets. 


Clearly span(u.i,..., u m ,v 1 ,..., Vj, Wi,... ,Wk ) contains (7i and U 2 
and hence contains U\ + (72- So to show that this list is a basis of 
Ui + U 2 we need only show that it is linearly independent. To prove 
this, suppose 


aiiii + ■ ■ ■ + a m u m + b 1V1 + ■ ■ ■ + bjVj + C\W\ + ■ ■ ■ + CkWk = 0, 

where all the a’s, b’s, and c’s are scalars. We need to prove that all the 
a’s, b’s, and c’s equal 0. The equation above can be rewritten as 

C 1 W 1 + ■ ■ ■ + CfcWfc = -aiU\ - ■ ■ ■ - a m u m - biv 1 ----- bjVj, 

which shows that C\W\ + ■ ■ ■ + c^Wk e U\. All the w’s are in U 2 , so this 
implies that CiWi + ■ ■ ■ + c^Wk G Ui n f/ 2 . Because (ui,u m ) is a 
basis of (7i n (72, we can write 


C \ W \ + ■ ■ ■ + CfcWfc — d\Ul + ■ ■ ■ + dynUm 




34 


Chapter 2. Finite-Dimensional Vector Spaces 


for some choice of scalars d\,...,d m . But (ui,... ,u m ,wi,... ,Wk) 
is linearly independent, so the last equation implies that all the c’s 
(and d’s) equal 0. Thus our original equation involving the a’s, b’s, and 
c’s becomes 


aiMi + ■ ■ ■ + a m u m + b\Vi + ■ ■ ■ + bjVj = 0. 


This equation implies that all the a’s and b’s are 0 because the list 
(u i,..., u m ,v i ,... ,Vj) is linearly independent. We now know that all 
the a’s, b’s, and c’s equal 0 , as desired. ■ 

The next proposition shows that dimension meshes well with direct 
sums. This result will be useful in later chapters. 


Recall that direct sum 
is analogous to disjoint 
union. Thus 2.19 is 
analogous to the 
statement that if a 
finite set B is written as 
A\ u ■ • ■ u A m and the 
sum of the number of 
elements in the A’s 
equals the number of 
elements in B, then the 
union is a disjoint 
union. 


2.19 Proposition: Suppose V is finite dimensional and JJ\,. .., U m 
are subspaces of V such that 

2.20 V = Ui + ■■■ + U m 
and 

2.21 dim V = dim U\ + ■ ■ ■ + dim U m . 

Then V = U\ ® ■ ■ ■ ® U m - 

Proof: Choose a basis for each Uj. Put these bases together in 
one list, forming a list that spans V (by 2.20) and has length dim V 
(by 2.21). Thus this list is a basis of V (by 2.16), and in particular it is 
linearly independent. 

Now suppose that u\ G Ui,, u m G U m are such that 


0 = U\ + ■ ■ ■ + u m . 

We can write each Uj as a linear combination of the basis vectors (cho¬ 
sen above) of Uj. Substituting these linear combinations into the ex¬ 
pression above, we have written 0 as a linear combination of the basis 
of V constructed above. Thus all the scalars used in this linear combina¬ 
tion must be 0. Thus each Uj = 0, which proves that V = Ui © ■ ■ ■ © U m 
(by 1 . 8 ). ■ 
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"Exercises 

1. Prove that if (vi,..., v n ) spans V, then so does the list 

(VI - v 2 ,v 2 -V3.v „-1 -Vn,V n ) 

obtained by subtracting from each vector (except the last one) 
the following vector. 

2. Prove that if (vi ,... ,v n ) is linearly independent in V, then so is 
the list 

(vi - v 2 ,v 2 -v 3 ,...,v n -1 -v n ,v n ) 

obtained by subtracting from each vector (except the last one) 
the following vector. 

3. Suppose (vi,...,v M ) is linearly independent in V and w G V. 
Prove that if (v\ + w,.. . ,v n + w) is linearly dependent, then 
w g span(vi,...,v M ). 

4. Suppose m is a positive integer. Is the set consisting of 0 and all 
polynomials with coefficients in F and with degree equal to m a 
subspace of fHF)? 

5. Prove that F 00 is infinite dimensional. 

6 . Prove that the real vector space consisting of all continuous real¬ 
valued functions on the interval [ 0 , 1 ] is infinite dimensional. 

7. Prove that V is infinite dimensional if and only if there is a se¬ 
quence Vi, V 2 , ■ ■ . of vectors in V such that (vi,..., v n ) is linearly 
independent for every positive integer n. 

8 . Let U be the subspace of R? defined by 

U = {(x\,X2,x 3 ,X4,xs) £ R 5 : Xi = 3x2 and x 3 = 7x 4 }. 

Find a basis of U. 

9. Prove or disprove: there exists a basis (po,pi,p 2 ,p 3 ) of T 3 (¥) 
such that none of the polynomials po, p 1 , p 2 , P 3 has degree 2 . 

10. Suppose that V is finite dimensional, with dim V = n. Prove that 
there exist one-dimensional subspaces JJ\,. .., U n of V such that 


V = Ui © ■ ■ ■ © U n . 
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11. Suppose that V is finite dimensional and U is a subspace of V 
such that dim!/ = dim V 7 . Prove that U = V. 

12. Suppose that po,pi,..., p m are polynomials in T m ( F) such that 
Pj(2) = 0 for each j. Prove that (po, pi, ..., p m ) is not linearly 
independent in P m (F). 

13. Suppose U and W are subspaces of R 8 such that dim!/ = 3, 
dim W = 5, and U + W = R 8 . Prove that U n W = {0}. 

14. Suppose that U and W are both five-dimensional subspaces of R 9 . 
Prove that U n W ^ {0}. 

15. You might guess, by analogy with the formula for the number 
of elements in the union of three subsets of a finite set, that 
if Ui,U 2 , U 3 are subspaces of a finite-dimensional vector space, 
then 

dim Oh + U 2 + U 3 ) 

= dim Ui + dim U 2 + dim U 3 
- dim(!/i n t/ 2 ) - dim(!/i n U 3 ) - dimity n U 3 ) 

+ dim(!/i n t/2 n t/3). 

Prove this or give a counterexample. 

16. Prove that if V is finite dimensional and t/i,..., U m are subspaces 
of V, then 

dim(t/i + ■ ■ ■ + U m ) < dimt/i + ■ ■ ■ + dimt/ m . 

17. Suppose V is finite dimensional. Prove that if U\,...,U m are 
subspaces of V such that V = Ui © ■ ■ ■ © U m , then 

dim V = dimt/i + ■ ■ ■ + dim U m . 


This exercise deepens the analogy between direct sums of sub¬ 
spaces and disjoint unions of subsets. Specifically, compare this 
exercise to the following obvious statement: if a finite set is writ¬ 
ten as a disjoint union of subsets, then the number of elements in 
the set equals the sum of the number of elements in the disjoint 
subsets. 



Chapter 3 


Linear Maps 


So far our attention has focused on vector spaces. No one gets ex¬ 
cited about vector spaces. The interesting part of linear algebra is the 
subject to which we now turn—linear maps. 

Let’s review our standing assumptions: 

Recall that F denotes R or C. 

Recall also that V is a vector space over F. 

In this chapter we will frequently need another vector space in ad¬ 
dition to V. We will call this additional vector space W: 

Let’s agree that for the rest of this chapter 
W will denote a vector space over F. 
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Definitions ancCTxampCes 


Some mathematicians 
use the term linear 
transformation, which 
means the same as 
linear map. 


A linear map from V to W is a function T: V — W with the following 
properties: 

additivity 

T(u + v) = Tu + Tv for all u, v e V; 

homogeneity 

T(av) = a(Tv) for all a e F and all v e V. 


Note that for linear maps we often use the notation Tv as well as the 
more standard functional notation T(v). 

The set of all linear maps from V to W is denoted £(V, W). Let’s 
look at some examples of linear maps. Make sure you verify that each 
of the functions defined below is indeed a linear map: 


zero 

In addition to its other uses, we let the symbol 0 denote the func¬ 
tion that takes each element of some vector space to the additive 
identity of another vector space. To be specific, 0 e £(V,W) is 
defined by 

Ov = 0. 

Note that the 0 on the left side of the equation above is a function 
from V to W, whereas the 0 on the right side is the additive iden¬ 
tity in W. As usual, the context should allow you to distinguish 
between the many uses of the symbol 0. 

identity 

The identity map, denoted I, is the function on some vector space 
that takes each element to itself. To be specific, I e £(V,V) is 
defined by 

Iv = v. 


differentiation 

Define T e £(P(R), P(R)) by 

Tp = p'. 

The assertion that this function is a linear map is another way of 
stating a basic result about differentiation: (/ + g)' = f + g' and 
(a/)' = af whenever f,g are differentiable and a is a constant. 
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integration 

Define T e £(T( R),R) by 

Tp = p(x) dx. 

Jo 

The assertion that this function is linear is another way of stating 
a basic result about integration: the integral of the sum of two 
functions equals the sum of the integrals, and the integral of a 
constant times a function equals the constant times the integral 
of the function. 

multiplication by x 2 

Define T e £(T(R),T(R)) by 

(Tp)(x) = x 2 p(x) 


for xeR. 

backward shift 

Recall that denotes the vector space of all sequences of ele¬ 
ments of F. Define T e X (F 00 , F 00 ) by 

T(x i,x 2 ,x 3 ,...) = (x 2 ,x 3 ,...). 


from F' 1 to F m 

Define T e £(R 3 , R 2 ) by 

T(x,y,z ) = (2x - y + 3z, lx + 5y - 6z). 

More generally, let m and n be positive integers, let aj t k e F for 
j = 1 ,m and k = 1,... ,n, and define T e £(¥ n , F m ) by 

T(X\ ,...,Xn) (&l,lXi + ■ ■ ■ -EU-i^X^, ..., &m,lX\ + ■ ■ + U X, j). 

Later we will see that every linear map from F fl to F'" is of this 
form. 

Suppose (vi,..., v n ) is a basis of V and T : V — W is linear. If v e V, 
then we can write v in the form 


v = a ivi + ■ ■ ■ + a n v n - 


Though linear maps are 
pervasive throughout 
mathematics, they are 
not as ubiquitous as 
imagined by some 
confused students who 
seem to think that cos 
is a linear map from R 
to R when they write 
“identities” such as 
cos 2x = 2 cos x and 
cos(x + y) = 
cosx + cosy. 


The linearity of T implies that 
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Tv = aiTvi + ■ ■ ■ + a n Tv n . 

In particular, the values of Tv i, ...,Tv n determine the values of T on 
arbitrary vectors in V. 

Linear maps can be constructed that take on arbitrary values on a 
basis. Specifically, given a basis (vi,... ,v n ) of V and any choice of 
vectors wi,..., w n G W, we can construct a linear map T:V-^W such 
that T Vj = Wj for j = 1,... ,n. There is no choice of how to do this—we 
must define T by 


T(a\Vi + ■ ■ ■ + a n v n ) = a\W\ + ■ ■ ■ + a n w n , 

where a\,...,a n are arbitrary elements of F. Because (vi, ..., v n ) is a 
basis of V, the equation above does indeed define a function T from V 
to W. You should verify that the function T defined above is linear and 
that Tvj = Wj for j = 1 ,... ,n. 

Now we will make £(V, W) into a vector space by defining addition 
and scalar multiplication on it. For S,T G £(V, W), define a function 
S + T G £(V, W) in the usual manner of adding functions: 

(5 + T)v = Sv + Tv 

for v G V. You should verify that S + T is indeed a linear map from V 
to W whenever S, T G £(V, W). For a G F and T G £(V, W), define a 
function aT G £(V, W) in the usual manner of multiplying a function 
by a scalar: 

( aT)v = a(Tv ) 

for v G V. You should verify that aT is indeed a linear map from V to W 
whenever a G F and T G £(V,W). With the operations we have just 
defined, £(V, W) becomes a vector space (as you should verify). Note 
that the additive identity of £(V, W) is the zero linear map defined 
earlier in this section. 

Usually it makes no sense to multiply together two elements of a 
vector space, but for some pairs of linear maps a useful product exists. 
We will need a third vector space, so suppose U is a vector space over F. 
If T G £{U,V) and 5 e £(V,W), then we define ST G £(U,W) by 

(ST)(v) = S(Tv) 

for v G U. In other words, ST is just the usual composition S °T of two 
functions, but when both functions are linear, most mathematicians 
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write ST instead of S ° T. You should verify that ST is indeed a linear 
map from U to W whenever T e L(U,V) and 5 e £(V,W). Note that 
ST is defined only when T maps into the domain of S. We often call 
ST the product of S and T. You should verify that it has most of the 
usual properties expected of a product: 

associativity 

(rir 2 )r 3 = Ti (T 2 T 3 ) whenever T\ , T 2 , and T 3 are linear maps such 
that the products make sense (meaning that T 3 must map into the 
domain of T 2 , and T> must map into the domain of T\). 

identity 

TI = T and IT = T whenever T e LTV, W) (note that in the first 
equation I is the identity map on V, and in the second equation I 
is the identity map on W). 

distributive properties 

(Si + S 2 )T = SiT + S 2 T and S(Ti + T 2 ) = S7i + ST 2 whenever 
T,7 i,T 2 g £(U,V) andS,S 1 ,S 2 e £(V,W). 


Multiplication of linear maps is not commutative. In other words, it 
is not necessarily true that ST = TS, even if both sides of the equation 
make sense. For example, if T e L(P(R), P(R) ) is the differentiation 
map defined earlier in this section and S G L(T(R),T(R)) is the mul¬ 
tiplication by x 2 map defined earlier in this section, then 

((ST)p)(x) = x 2 p'(x) but ((TS)p)(x) = x 2 p'{x) + 2xp(x). 

In other words, multiplying by x 2 and then differentiating is not the 
same as differentiating and then multiplying by x 2 . 

Nutt Spaces and'Ranges 

Some mathematicians 
use the term kernel 
instead of null space. 


For T G £(V, W), the null space of T, denoted nullT, is the subset 
of V consisting of those vectors that T maps to 0: 

nullT = {v G V : Tv = 0}. 


Let’s look at a few examples from the previous section. In the dif¬ 
ferentiation example, we defined T G £(T(R),T(R)) by Tp = p' . The 
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only functions whose derivative equals the zero function are the con¬ 
stant functions, so in this case the null space of T equals the set of 
constant functions. 

In the multiplication by x 2 example, we defined T G £(T( R), T(R)) 
by ( Tp)(x ) = x 2 p(x). The only polynomial p such that x 2 p(x) = 0 
for all x e R is the 0 polynomial. Thus in this case we have 

null T = {0}. 

In the backward shift example, we defined T e £(F“,F“) by 
T(x i,x 2 ,x 3 ,...) = (x 2 ,x 3 ,...). 

Clearly T (x\, x 2 , x 3 ,...) equals 0 if and only if x 2 , x 3 ,... are all 0. Thus 
in this case we have 


nullT = {(a, 0,0,...) : a e F}. 


The next proposition shows that the null space of any linear map is 
a subspace of the domain. In particular, 0 is in the null space of every 
linear map. 

B.l Proposition: If T G L(V, W), then nullT is a subspace of V. 

Proof: Suppose T e £(V, W). By additivity, we have 

T(0) = 7T0 + 0) = T(0) + T (0), 

which implies that T( 0) = 0. Thus 0 e nullT. 

If u, v G null T, then 

T(u + v) = Tu + Tv = 0 + 0 = 0, 

and hence u + v G null T. Thus null T is closed under addition. 

If u G null T and a G F, then 

T(au) = aTu = aO = 0, 

and hence au G null T. Thus null T is closed under scalar multiplica¬ 
tion. 

We have shown that null T contains 0 and is closed under addition 
and scalar multiplication. Thus null T is a subspace of V. m 
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A linear map T: V — If is called injective if whenever it, v G V 
and Tu = Tv, we have u = v. The next proposition says that we 
can check whether a linear map is injective by checking whether 0 is 
the only vector that gets mapped to 0. As a simple application of this 
proposition, we see that of the three linear maps whose null spaces we 
computed earlier in this section (differentiation, multiplication by x 2 , 
and backward shift), only multiplication by x 2 is injective. 


Many ma thema ticians 
use the term 
one-to-one, which 
means the same as 
injective. 


3.2 Proposition: Let T G £(V,W). Then T is injective if and only 
if null T = {0}. 


Proof: First suppose that T is injective. We want to prove that 
null T = {0}. We already know that {0} c null T (by 3.1). To prove the 
inclusion in the other direction, suppose v G null T. Then 

T(v) = 0 = T(0). 

Because T is injective, the equation above implies that v = 0. Thus 
nullT = {0}, as desired. 

To prove the implication in the other direction, now suppose that 
nullT = {0}. We want to prove that T is injective. To do this, suppose 
u, v e V and Tu = Tv. Then 


0 = Tu - Tv = T(u-v). 


Thus u - v is in nullT, which equals {0}. Hence u - v = 0, which 
implies that u = v. Hence T is injective, as desired. ■ 


For T G L(V,W), the range of T, denoted range T, is the subset of 
W consisting of those vectors that are of the form Tv for some v e V: 

range T = {Tv : v G V}. 

For example, if T G L(T(R), T(R)) is Ihe differentiation map defined by 
Tp = p' , then range T = T(R) because for every polynomial q G T*(R) 
there exists a polynomial p G T( R) such that p' = q. 

As another example, if T G £{T(R),T(R)) is the linear map of 
multiplication by x 2 defined by (Tp)(x) = x 2 p(x), then the range 
of T is the set of polynomials of the form a 2 X 2 + ■ ■ ■ + a m x m , where 
O- 2, ■ ■ ■, tim £ R- 

The next proposition shows that the range of any linear map is a 
subspace of the target space. 


Some mathematicians 
use the word image, 
which means the same 
as range. 
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3.3 Proposition: If T G £(V, W), then range T is a suhspace of W. 


Proof: Suppose T G £(V,W). Then T(0) = 0 (by 3.1), which im¬ 
plies that 0 G range T. 

If w i , w 2 G range T, then there exist Vi , V 2 e V such that Tv i = Wi 
and TV 2 = iv 2 . Thus 

T(Vl + V2) = Tvi + TV2 = Wi + W 2 , 

and hence w 1 + w 2 G range T. Thus range T is closed under addition. 

If w G range T and a G F, then there exists v G V such that Tv = w. 
Thus 

T(av) = aTv = aw, 

and hence aw G range T. Thus range T is closed under scalar multipli¬ 
cation. 

We have shown that range T contains 0 and is closed under addition 
and scalar multiplication. Thus range T is a subspace of W. m 


Many ma thema ticians 
use the term onto, 
which means the same 
as surjective. 


A linear map T : V — W is called surjective if its range equals W. 
For example, the differentiation map T G X(?(R),?(R)) defined by 
Tp = p' is surjective because its range equals JTR). As another exam¬ 
ple, the linear map T G X(T’(R),T’(R)) defined by (Tp)(x) = x 2 p(x) is 
not surjective because its range does not equal JTR). As a final exam¬ 
ple, you should verify that the backward shift T G X (F 00 , F 00 ) defined 
by 

T(x i,x 2 ,x 3 ,...) = (x 2 ,x 3 ,...) 


is surjective. 

Whether a linear map is surjective can depend upon what we are 
thinking of as the target space. For example, hx a positive integer m. 
The differentiation map T G £ (T m (II), T m ( R )) defined by Tp = p' 
is not surjective because the polynomial x m is not in the range of T. 
However, the differentiation map T G £(T m (R), T m -i(R)) defined by 
Tp = p' is surjective because its range equals T’ m _i(R), which is now 
the target space. 

The next theorem, which is the key result in this chapter, states that 
the dimension of the null space plus the dimension of the range of a 
linear map on a finite-dimensional vector space equals the dimension 
of the domain. 
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3.4 Theorem: If V is finite dimensional and T G L(V,W), then 
range T is a finite-dimensional subspace of W and 

dim V 7 = dim null T + dim range T. 

Proof: Suppose that V is a finite-dimensional vector space and 
T G £(V, W). Let (mi, , iim ) be a basis of null T ; thus dim null T = m. 
The linearly independent list (u\,... ,u m ) can be extended to a ba¬ 
sis of V (by 2.12). Thus dim V = m + n, 

and to complete the proof, we need only show that range T is finite 
dimensional and dim range T = n. We will do this by proving that 
( Tw i,..., Tw n ) is a basis of range T. 

Let v gV. Because (mi, u m ,Wi,.. .,w n ) spans V, we can write 


v = a\U\ + ■ ■ ■ + a m u m + biw\ + ■ ■ ■ + b n Wn, 

where the a’s and b’s are in F. Applying T to both sides of this equation, 
we get 

Tv = biTwi + ■ ■ ■ + b n Tw n , 

where the terms of the form TUj disappeared because each uj e null T. 
The last equation implies that (Tw i,..., Tw n ) spans range T. In par¬ 
ticular, range T is finite dimensional. 

To show that ( Tw \,..., Tw n ) is linearly independent, suppose that 
Ci,... ,c M E F and 


CiTwi + ■ ■ ■ + c n Tw n = 0. 


Then 


T(c itvi + ■ ■ ■ + c n w„) = 0, 


and hence 


crwi + ■ ■ ■ + c n w n g nullT. 


Because (ui,..., u m ) spans null T, we can write 


CiWi + ■ ■ ■ + CnWfi — d\U\ + ■ ■ ■ + dmUrn, 


where the d’s are in F. This equation implies that all the c’s (and d's) 
are 0 (because (mi, . .., u m , w\,... , w n ) is linearly independent). Thus 
( Tw i,..., Tw n ) is linearly independent and hence is a basis for range T, 
as desired. ■ 
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Now we can show that no linear map from a finite-dimensional vec¬ 
tor space to a “smaller” vector space can be injective, where “smaller” 
is measured by dimension. 

3.5 Corollary: If V and W are finite-dimensional vector spaces such 
that dim V > dim IT, then no linear map from V to W is injective. 

Proof: Suppose V and W are finite-dimensional vector spaces such 
that dimV > dim W. Let T e £{V, W). Then 

dim null T = dim V - dim range T 

> dim V - dim W 

> 0 , 

where the equality above comes from 3.4. We have just shown that 
dim null T > 0. This means that nullT must contain vectors other 
than 0. Thus T is not injective (by 3.2). ■ 

The next corollary, which is in some sense dual to the previous corol¬ 
lary, shows that no linear map from a finite-dimensional vector space 
to a “bigger” vector space can be surjective, where “bigger” is measured 
by dimension. 

3.6 Corollary: If V and W are finite-dimensional vector spaces such 
that dim V < dim W, then no linear map from V to W is surjective. 

Proof: Suppose V and W are finite-dimensional vector spaces such 
that dimV < dim W. Let T e £{V, W). Then 

dim range T = dim V - dim null T 

< dim V 

< dimW, 

where the equality above comes from 3.4. We have just shown that 
dimrangeT < dim W. This means that range T cannot equal W. Thus 
T is not surjective. ■ 

The last two corollaries have important consequences in the theory 
of linear equations. To see this, fix positive integers m and n, and let 
dj'k € F for j = 1,..., m and k = 1,..., n. Define T: F n -* F m by 
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n n 

T(X\, . . . , Xn) = ( Cl\ t kXk, ■ ■ ■ i ^ Q-m,kXk) ■ 
k =1 fc=l 


Now consider the equation Tx = 0 (where x e F n and the 0 here is 
the additive identity in F ,n , namely, the list of length m consisting of 
all 0’s). Letting x = (xi,... ,x„), we can rewrite the equation Tx = 0 
as a system of homogeneous equations: 

n 

X a hkXk = 0 
k= 1 


Homogeneous, in this 
context, means that the 
constant term on the 
right side of each 
equation equals 0. 


n 

^ Q-m,kXk = 0 . 

k=l 

We think of the a’s as known; we are interested in solutions for the 
variables Xi,... ,x n . Thus we have m equations and n variables. Obvi¬ 
ously xi = ■ ■ ■ = x n = 0 is a solution; the key question here is whether 
any other solutions exist. In other words, we want to know if null T is 
strictly bigger than {0}. This happens precisely when T is not injective 
(by 3.2). From 3.5 we see that T is not injective if n > m. Conclusion: 
a homogeneous system of linear equations in which there are more 
variables than equations must have nonzero solutions. 

With T as in the previous paragraph, now consider the equation 
Tx = c, where c = (ci,... ,c m ) £ F ,n . We can rewrite the equation 
Tx = c as a system of inhomogeneous equations: 


n 

^ &l,kXk = Cl 
k =1 


n 

^ fl-m,kXk = Cm- 
k =1 

As before, we think of the a’s as known. The key question here is 
whether for every choice of the constant terms Ci,...,c TO e F, there 
exists at least one solution for the variables Xi,..., x n . In other words, 
we want to know whether range T equals F ,n . From 3.6 we see that T 
is not surjective if n < m. Conclusion: an inhomogeneous system of 
linear equations in which there are more equations than variables has 
no solution for some choice of the constant terms. 


These results about 
homogeneous systems 
with more variables 
than equations and 
inhomogeneous 
systems with more 
equations than 
variables are often 
proved using Gaussian 
elimination. The 
abstract approach 
taken here leads to 
cleaner proofs. 
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"the Matrix of a Linear Map 

We have seen that if (vi,..., v n ) is a basis of V and T : V — W is 
linear, then the values of Tv \,..., Tv n determine the values of T on 
arbitrary vectors in V. In this section we will see how matrices are used 
as an efficient method of recording the values of the Tv /s in terms of 
a basis of W. 

Let m and n denote positive integers. An m-by-n matrix is a rect¬ 
angular array with m rows and n colu mn s that looks like this: 

0-1,1 ■■■ O-l,n 

3.7 

O-m, 1 ■ ■ ■ 

Note that the first index refers to the row number and the second in¬ 
dex refers to the column number. Thus a 3,2 refers to the entry in the 
third row, second column of the matrix above. We will usually consider 
matrices whose entries are elements of F. 

Let T G £(V,W). Suppose that (vi,...,v M ) is a basis of V and 
(wi,..., w m ) is a basis of W. For each k = 1,..., n, we can write Tvu 
uniquely as a linear combination of the w’s: 

3.8 LVfe — 1 + ■ ■ ■ + ayyi^Wyyi, 

where aj^ e F for j = 1,... ,m. The scalars aj ,fc completely determine 
the linear map T because a linear map is determined by its values on 
a basis. The m-by-n matrix 3.7 formed by the a’s is called the matrix 
of T with respect to the bases (vi,..., v n ) and (w\,w m ); we denote 
it by 

M{T, (Vi,...,v n ), (Wi,..., w m )). 

If the bases (vi,...,v n ) and (w \,..., w m ) are clear from the context 
(for example, if only one set of bases is in sight), we write just M(T) 
instead of M(T, (vi,...,v„), (wi,..., w m )). 

As an aid to remembering how M(T) is constructed from T, you 
might write the basis vectors vi,...,v n for the domain across the top 
and the basis vectors wi,, w m for the target space along the left, as 
follows: 
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Vi ... Vfc ... v„ 


Wi 


Cll,k 


W m 


&m,k 


Note that in the matrix above only the fc th column is displayed (and thus 
the second index of each displayed a is k). The fc th column of M(T) 
consists of the scalars needed to write Tvu as a linear combination of 
the w’s. Thus the picture above should remind you that TVk is retrieved 
from the matrix M(T) by multiplying each entry in the k th column by 
the corresponding w from the left column, and then adding up the 
resulting vectors. 

If T is a linear map from F n to F m , then unless stated otherwise you 
should assume that the bases in question are the standard ones (where 
the k th basis vector is 1 in the k th slot and 0 in all the other slots). If 
you think of elements of F m as columns of m numbers, then you can 
think of the k th column of M(T) as T applied to the k th basis vector. 
For example, if T e £(F 2 , F 3 ) is defined by 


With respect to any 
choice of bases, the 
matrix of the 0 linear 
map (the linear map 
that takes every vector 
to 0) consists of all 0’s. 


T(x,y ) = (x + 3 y, 2x + 5 y, lx + 9 y), 


then T(l, 0) = (1, 2, 7) and T(0,1) = (3, 5, 9), so the matrix of T (with 
respect to the standard bases) is the 3-by-2 matrix 

" 13 " 

2 5 . 

7 9 

Suppose we have bases (vi,...,v n ) of V and (wi,...,w m ) of W. 
Thus for each linear map from V to W, we can talk about its matrix 
(with respect to these bases, of course). Is the matrix of the sum of two 
linear maps equal to the sum of the matrices of the two maps? 

Right now this question does not make sense because, though we 
have defined the sum of two linear maps, we have not defined the sum 
of two matrices. Fortunately the obvious definition of the sum of two 
matrices has the right properties. Specifically, we define addition of 
matrices of the same size by adding corresponding entries in the ma¬ 
trices: 
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ai'i 

®l,n 


hi.i ■ 

bl,n 



+ 



O-m.,1 

Om,n 


bm,l 

bm,n 


&i,i + £h,i 

&l,n + 

ttm, 1 + i 

Cim,n + 


You should verify that with this definition of matrix addition, 
3.9 M(T + S) = M(T) + M(S) 


whenever T,S e £(V,W). 

Still assuming that we have some bases in mind, is the matrix of a 
scalar times a linear map equal to the scalar times the matrix of the 
linear map? Again the question does not make sense because we have 
not defined scalar multiplication on matrices. Fortunately the obvious 
definition again has the right properties. Specifically, we define the 
product of a scalar and a matrix by multiplying each entry in the matrix 
by the scalar: 


0-1,1 

O-l ,n 


CCL\ t 1 

. . CCLi t fi 

O-m, 1 

Om,n 


CO-m, 1 

C0Lm,n 


You should verify that with this definition of scalar multiplication on 
matrices, 

3.10 SM(cT) = cM(T) 

whenever c e F and T e £(V, W). 

Because addition and scalar multiplication have now been defined 
for matrices, you should not be surprised that a vector space is about 
to appear. We need only a bit of notation so that this new vector space 
has a name. The set of all m-by-n matrices with entries in F is denoted 
by Mat(m, n, F). You should verify that with addition and scalar mul¬ 
tiplication defined as above, Mat(ra,rz,F) is a vector space. Note that 
the additive identity in Mat(m, n, F) is the m-by-n matrix all of whose 
entries equal 0. 

Suppose (vi,..., v n ) is a basis of V and (wi,.. .,w m ) is a basis of W. 
Suppose also that we have another vector space U and that (u\,... ,u p ) 
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is a basis of U. Consider linear maps 5: U — V and T: V — W. The 
composition TS is a linear map from U to W. How can M(TS) be 
computed from M(T) and M(S)1 The nicest solution to this question 
would be to have the following pretty relationship: 

3.11 M(TS) = 

So far, however, the right side of this equation does not make sense 
because we have not yet defined the product of two matrices. We will 
choose a definition of matrix multiplication that forces the equation 
above to hold. Let’s see how to do this. 

Let 


M(T) = 

&i,i 

\ ,n 

and M{S) = 

b i,i .. 

b\,p 


CL m ,l 

CL m ,n 


bn, 1 

bn,p 


For k g {1,..., p}, we have 

n 

TSu k = T( X b r ,kVr ) 

r =1 
n 

= X b r ^Tv r 

r= 1 

n m 

= X X a j,r w j 

r 1 1 

m n 

= X ( X ^i,rbr,fc) Wj. 
j =1 r=X 

Thus M(TS) is the m-by-p matrix whose entry in row j, column k 
equals Y?r=i a j,rb r ,k- 

Nowit’s clear how to define matrix multiplication so that 3.11 holds. 
Namely, if A is an m-by-n matrix with entries and B is an n-by-p 
matrix with entries h / fc , then AB is defined to be the m-by-p matrix 
whose entry in row j, column k, equals 

n 

X a j,rb r ,k- 

r= 1 

In other words, the entry in row j, column k, of AB is computed by 
taking row j of A and column k of B, multiplying together correspond¬ 
ing entries, and then summing. Note that we define the product of two 


You probably learned 
this definition of matrix 
multiplication in an 
earlier course, although 
you may not have seen 
this motivation for it. 
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matrices only when the number of columns of the first matrix equals 
the number of rows of the second matrix. 


You should find an As an example of matrix multiplication, here we multiply together 
example to show that a 3-by-2 matrix and a 2-by-4 matrix, obtaining a 3-by-4 matrix: 
matrix multiplication is 
not commutative. In 
other words, AB is not 
necessarily equal to BA, 

even when both are Suppose (Vi ,..., v n ) is a basis of V. If v e V, then there exist unique 
defined. scalars bi,...,b n such that 


3 4 1 r 6 5 4 3 

: Z 210-1 

5 b J L J 


10 7 4 1 

26 19 12 5 
42 31 20 9 


3.12 


v = b ivi + ■ ■ ■ + b n v n . 


The matrix of v, denoted M (v), is the n-by-1 matrix defined by 

b\ 

3.13 M(v) = : 

bn 

Usually the basis is obvious from the context, but when the basis needs 
to be displayed explicitly use the notation M (v, (Vi,..., v n )) instead 
of M(v). 

For example, the matrix of a vector x e F n with respect to the stan¬ 
dard basis is obtained by writing the coordinates of x as the entries in 
an n-by-1 matrix. In other words, if x = (xi,..., x n ) e F n , then 

" Xi 

Mix) = : 

. X n 

The next proposition shows how the notions of the matrix of a linear 
map, the matrix of a vector, and matrix multiplication fit together. In 
this proposition M(Tv) is the matrix of the vector Tv with respect to 
the basis (wi,..., w m ) and M(v) is the matrix of the vector v with re¬ 
spect to the basis (vi,..., v n ), whereas M(T) is the matrix of the linear 
map T with respect to the bases (vi,..., v n ) and (wi,..., w m ). 


3.14 Proposition: Suppose T e L(V ,W) and (vi,... ,v n ) is a basis 
of V and (wi,..., w m ) is a basis of W. Then 

M(Tv) = M(T)Miv) 


for every v e V. 
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Proof: Let 

0-1,1 ■■■ a \,n 

3.15 M(T)= : : 

Om, 1 ■ ■ ■ Om,n 

This means, we recall, that 

m 

3.16 TVfc = Oj } kWj 

j =1 

for each fc. Let v be an arbitrary vector in V, which we can write in the 
form 3.12. Thus M(v) is given by 3.13. Now 

Tv = b\Tvi + ■ ■ ■ + b n Tv n 

m m 

= b i ^ aj t iWj + ■ ■ ■ + b n X Oj, n Wj 
j= i j= i 

m 

= ^ (ay,ihi + ■ ■ ■ + aj'-nbnYWj, 

3 = 1 

where the first equality comes from 3.12 and the second equality comes 
from 3.16. The last equation shows that M(Tv), the m-by-1 matrix of 
the vector Tv with respect to the basis (wi,..., w m ), is given by the 
equation 

o\,ibi + ■ ■ ■ + ai tH b n 

M(Tv) = \ 

Om.lbi + ■ ■ ■ + CL m,nbn 

This formula, along with the formulas 3.15 and 3.13 and the definition 
of matrix multiplication, shows that M(Tv) = M(T)M(v). ■ 

InvertihiCity 

A linear map T e £(V, W) is called invertible if there exists a linear 
map S e £{W, V) such that ST equals the identity map on V and TS 
equals the identity map on W. A linear map S e £(W,V) satisfying 
ST = I and TS = I is called an inverse of T (note that the first I is the 
identity map on V and the second I is the identity map on W). 

If S and S' are inverses of T, then 
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5 = 5/ = S(TS') = (ST)S' = 15' = 5', 

so 5 = 5'. In other words, if T is invertible, then it has a unique 
inverse, which we denote by T 1 . Rephrasing all this once more, if 
T e £{V, W) is invertible, then T is the unique element of L(W, V) 
such that T- 1 T = / and TT 1 = I. The following proposition charac¬ 
terizes the invertible linear maps. 

B. 1 7 Proposition: A linear map is invertible if and only if it is injec¬ 
tive and surjective. 

Proof: Suppose T G £(V, W). We need to show that T is invertible 
if and only if it is injective and surjective. 

First suppose that T is invertible. To show that T is injective, sup¬ 
pose that u, v e V and Tu = Tv. Then 

u = T~HTu) = T~ 1 (Tv) = v, 


so u = v. Hence T is injective. 

We are still assuming that T is invertible. Now we want to prove 
that T is surjective. To do this, let w G W. Then w = T(T~ l w), which 
shows that w is in the range of T. Thus range T = W, and hence T is 
surjective, completing this direction of the proof. 

Now suppose that T is injective and surjective. We want to prove 
that T is invertible. For each w G W, define 5 w to be the unique ele¬ 
ment of V such that T(Sw) = w (the existence and uniqueness of such 
an element follow from the surjectivity and injectivity of T). Clearly 
TS equals the identity map on W. To prove that ST equals the identity 
map on V, let v e V. Then 

T(STv) = (TS)(Tv) = I {Tv) = Tv. 

This equation implies that STv = v (because T is injective), and thus 
ST equals the identity map on V. To complete the proof, we need to 
show that 5 is linear. To do this, let Wi, W 2 e W. Then 

T(Swi + SW 2 ) = T(Sw\) + T(S W 2 ) = W| +W 2 - 

Thus 5wi + 5 w 2 is the unique element of V that T maps to wi + W 2 - By 
the definition of 5, this implies that S(wi + W 2 ) = Swi + S W 2 . Hence 
5 satishes the additive property required for linearity. The proof of 
homogeneity is similar. Specifically, if w e W and a G F, then 
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T(aSw) = aT(Sw) = aw. 

Thus aSw is the unique element of V that T maps to aw. By the 
definition of S, this implies that S(aw) = aSw. Hence S is linear, as 
desired. ■ 


Two vector spaces are called isomorphic if there is an invertible 
linear map from one vector space onto the other one. As abstract vector 
spaces, two isomorphic spaces have the same properties. From this 
viewpoint, you can think of an invertible linear map as a relabeling of 
the elements of a vector space. 

If two vector spaces are isomorphic and one of them is finite dimen¬ 
sional, then so is the other one. To see this, suppose that V and W 
are isomorphic and that T G £(V, W) is an invertible linear map. If V 
is finite dimensional, then so is W (by 3.4). The same reasoning, with 
T replaced with T -1 G £(W, V), shows that if W is finite dimensional, 
then so is V. Actually much more is true, as the following theorem 
shows. 


The Greek word isos 
means equal; the Greek 
word morph means 
shape. Thus 
isomorphic literally 
means equal shape. 


3.1 8 Theorem: Two finite-dimensional vector spaces are isomorphic 
if and only if they have the same dimension. 

Proof: First suppose V and W are isomorphic finite-dimensional 
vector spaces. Thus there exists an invertible linear map T from V 
onto W. Because T is invertible, we have null T = { 0} and range T = W. 
Thus dim null T = 0 and dim range T = dim IT. The formula 

dim V = dim null T + dim range T 

(see 3.4) thus becomes the equation dim V = dim IT, completing the 
proof in one direction. 

To prove the other direction, suppose V and IT are finite-dimen¬ 
sional vector spaces with the same dimension. Let (vi,...,v n ) be a 
basis of V and (w\,..., w n ) be a basis of IT. Let T be the linear map 
from V to IT defined by 


T(a\V\ + ■ ■ ■ + fl n v n ) = ayw\ + ■ ■ ■ + a n w n - 


Then T is surjective because (wi,... , w n ) spans W, and T is injective 
because (wi,..., w „) is linearly independent. Because T is injective and 
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Because every 
fini te-dimensional 
vector space is 
isomorphic to some F", 
why bother with 
abstract vector spaces? 
To answer this 
question, note that an 
investigation of F" 
would soon lead to 
vector spaces that do 
not equal F". For 
example, we would 
encounter the null 
space and range of 
linear maps, the set of 
matrices MaI ( n , n,F), 
and the polynomials 
T n ( F). Though each of 
these vector spaces is 
isomorphic to some 
F m , thinking of them 
that way often adds 
complexity but no new 
insight. 


surjective, it is invertible (see 3.17), and hence V and W are isomorphic, 
as desired. ■ 

The last theorem implies that every finite-dimensional vector space 
is isomorphic to some F' 1 . Specifically, if V is a finite-dimensional vector 
space and dim V = n, then V and F f! are isomorphic. 

If (vi,... , v n ) is a basis of V and (ivi,..., w m ) is a basis of W, then 
for each T e £(V, W), we have a matrix M(T) e Mat(m, n, F). In other 
words, once bases have been fixed for V and W, M becomes a function 
from L(V, W) to Mat(m, n, F). Notice that 3.9 and 3.10 show that M is 
a linear map. This linear map is actually invertible, as we now show. 

3.19 Proposition: Suppose that (vi,...,v M ) is a basis of V and 
( wi,..., w m ) is a basis of W. Then M is an invertible linear map be¬ 
tween L(V,W) and Mat(m, n, F). 

Proof: We have already noted that M is linear, so we need only 
prove that M is injective and surjective (by 3.17). Both are easy. Let’s 
begin with injectivity. If T e L(V,W) and M(T) = 0, then Tv k = 0 
for k = 1,..., n. Because (vi,..., v n ) is a basis of V, this implies that 
T = 0. Thus M is injective (by 3.2). 

To prove that M is surjective, let 


a 1,1 

CLi, n 

Um.l 

CL m ,n 


be a matrix in Matf m, n , F). Let T be the linear map from V to W such 
that 

m 

Tv k = X o.j, k Wj 
j =i 

for k = 1 ,...,n. Obviously M(T) equals A, and so the range of M 
equals Mat(m, n, F), as desired. ■ 

An obvious basis of Mat(m, n, F) consists of those m-by-n matrices 
that have 0 in all entries except for a 1 in one entry. There are mn such 
matrices, so the dimension of Mat(m, n, F) equals mn. 

Now we can determine the dimension of the vector space of linear 
maps from one finite-dimensional vector space to another. 



Invertibility 


57 


3.20 Proposition: If V and W are finite dimensional, then £(V, W) 
is finite dimensional and 

dim £{V,W) = (dim V) (dim W). 

Proof: This follows from the equation dimMat(ra, n,F) = mn, 
3.18, and 3.19. ■ 

A linear map from a vector space to itself is called an operator. If 
we want to specify the vector space, we say that a linear map T: V — V 
is an operator on V. Because we are so often interested in linear maps 
from a vector space into itself, we use the notation £(V) to denote the 
set of all operators on V. In other words, £(V) = £(V, V). 

Recall from 3.17 that a linear map is invertible if it is injective and 
surjective. For a linear map of a vector space into itself, you might 
wonder whether injectivity alone, or surjectivity alone, is enough to 
imply invertibility. On infinite-dimensional vector spaces neither con¬ 
dition alone implies invertibility. We can see this from some examples 
we have already considered. The multiplication by x 2 operator (from 
P(R) to itself) is injective but not surjective. The backward shift (from 
F 00 to itself) is surjective but not injective. In view of these examples, 
the next theorem is remarkable—it states that for maps from a finite¬ 
dimensional vector space to itself, either injectivity or surjectivity alone 
implies the other condition. 

3.21 Theorem: Suppose V is finite dimensional. If T e £(V), then 
the following are equivalent: 

(a) T is invertible; 

(b) T is injective; 

(c) T is surjective. 

Proof: Suppose T g £{V). Clearly (a) implies (b). 

Now suppose (b) holds, so that T is injective. Thus null T = {0} 
(by 3.2). From 3.4 we have 

dim range T = dim V - dim null T 
= dimV, 

which implies that range T equals V (see Exercise 11 in Chapter 2). Thus 
T is surjective. Hence (b) implies (c). 


The deepest and most 
important parts of 
linear algebra, as well 
as most of the rest of 
this book, deal with 
operators. 



58 


Chapter 3. Linear Maps 


Now suppose (c) holds, so that T is surjective. Thus range T = V. 
From 3.4 we have 


dim null T = dim V - dim range T 
= 0 , 

which implies that null T equals {0}. Thus T is injective (by 3.2), and 
so T is invertible (we already knew that T was surjective). Hence (c) 
implies (a), completing the proof. ■ 
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:Exercises 


1. Show that every linear map from a one-dimensional vector space 
to itself is multiplication by some scalar. More precisely, prove 
that if dim V = 1 and T g £(V, V), then there exists a g F such 
that Tv = av for all v G V. 


2. 


3. 


4. 


Give an example of a function /: R 2 -» R such that 

f (av) = af(v) 

for all a G R and all v G R 2 but / is not linear. 

Suppose that V is finite dimensional. Prove that any linear map 
on a subspace of V can be extended to a linear map on V. In 
other words, show that if U is a subspace of V and S G £(U,W), 
then there exists T e £(V, W) such that Tu = Su for all u e U. 

Suppose that T is a linear map from V to F. Prove that if u G V 
is not in null T, then 

V = null T © {au : a e F}. 


Exercise 2 shows that 
homogeneity alone is 
not enough to imply 
that a function is a 
linear map. Additivity 
alone is also not 
enough to imply that a 
function is a linear 
map, although the 
proof of this involves 
advanced tools that are 
beyond the scope of 
this book. 


5. Suppose that T G £{V, W) is injective and (vi,..., v n ) is linearly 
independent in V. Prove that (Tv i,..., Tv n ) is linearly indepen¬ 
dent in W. 

6. Prove that if Si,..., S n are injective linear maps such that .Sj_S'„ 

makes sense, then 5i... S n is injective. 

7. Prove that if (vi,..., v n ) spans V and T G £(V, W) is surjective, 
then (Tv i,..., Tv n ) spans W. 

8. Suppose that V is finite dimensional and that T G £(V, W). Prove 
that there exists a subspace U of V such that U n null T = { 0} 
and range T = {Tu :u G U}. 

9. Prove that if T is a linear map from F 4 to F 2 such that 


nullT = {(xi,X2,X3,X4) G F 4 : xi = 5 x 2 and xs = 7 x 4}, 


then T is surjective. 



60 


Chapter 3. Linear Maps 


10. Prove that there does not exist a linear map from F 5 to F 2 whose 
null space equals 

{(xi, X 2 , X 3 , X 4 , X 5 ) e F 5 : x% = 3 x 2 and X 3 = X 4 = X 5 }. 

11. Prove that if there exists a linear map on V whose null space and 
range are both finite dimensional, then V is finite dimensional. 

12. Suppose that V and W are both finite dimensional. Prove that 
there exists a surjective linear map from V onto W if and only if 
dim W < dim V. 

13. Suppose that V and W are finite dimensional and that U is a 
subspace of V. Prove that there exists T G £(V,W) such that 
null T = U if and only if dim U > dim V - dim W. 

14. Suppose that W is finite dimensional and T G £(V,W). Prove 
that T is injective if and only if there exists S G L(W, V) such 
that ST is the identity map on V. 

15. Suppose that V is finite dimensional and 7' e £(V,W). Prove 
that T is surjective if and only if there exists S G £{W, V ) such 
that TS is the identity map on W. 

16. Suppose that U and V are finite-dimensional vector spaces and 
that S G £(V, W), T e £(U, V). Prove that 

dim null 5 T < dim null S + dim null T. 

17. Prove that the distributive property holds for matrix addition 
and matrix multiplication. In other words, suppose A, B, and C 
are matrices whose sizes are such that A(B + C ) makes sense. 
Prove that AB + AC makes sense and that A(B + C) = AB + AC. 

18. Prove that matrix multiplication is associative. In other words, 
suppose A, B, and C are matrices whose sizes are such that 
(AB)C makes sense. Prove that A(BC) makes sense and that 
(AB)C = A(BC). 
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This exercise shows 
that T has the form 
promised on page 39. 


Suppose T e £(F”, F w ) and that 


M(T) 


^1,1 ■■■ fll,n 

Am, 1 ■ ■ ■ flm,?t 


where we are using the standard bases. Prove that 


T (X \,... , Xn) — (CL ipXi + ' ■ ■ +fll,n^ni ...» fl-m,!-^! "t " ' '3~^m,nXn) 


for every (x\,...,x n ) G F”. 


20. Suppose (vi,...,v n ) is a basis of V. Prove that the function 
T: V — Mat(n, 1, F) dehned by 

Tv = M(v ) 

is an invertible linear map of V onto Mat(n, 1, F); here M(v) is 
the matrix of v G V with respect to the basis (vi,...,v n ). 

21. Prove that every linear map from Mat(n, 1, F) to Mat(m, 1, F) is 
given by a matrix multiplication. In other words, prove that if 
T G £(Mat(n, 1,F),Mat(m, 1,F)), then there exists an m-by-n 
matrix A such that TB = AB for every B G Mat(n, 1, F). 

22. Suppose that V is finite dimensional and S,T G L(V). Prove that 
ST is invertible if and only if both 5 and T are invertible. 

23. Suppose that V is finite dimensional and S,T G £(V). Prove that 
ST = I if and only if TS = I. 

24. Suppose that V is finite dimensional and T g L(V). Prove that 
T is a scalar multiple of the identity if and only if ST = TS for 
every S G £(V). 

25. Prove that if V is finite dimensional with dim V > 1, then the set 
of noninvertible operators on V is not a subspace of £(V). 
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26. Suppose n is a positive integer and a/j e F for i,j = 1 
Prove that the following are equivalent: 

(a) The trivial solution X\ = ■ ■ ■ = x n = 0 is the only solution 
to the homogeneous system of equations 

n 

X «1 ,k*k = 0 

k= 1 

n 

^ &n,kXk = 0. 
fc= 1 

(b) For every c\,...,c n e F, there exists a solution to the sys¬ 
tem of equations 

n 

Ul= Cl 

k=l 

n 

^ fl-n,kXk = Cn- 
k=l 

Note that here we have the same number of equations as vari¬ 
ables. 



Chapter 4 


TotynonuaCs 


This short chapter contains no linear algebra. It does contain the 
background material on polynomials that we will need in our study 
of linear maps from a vector space to itself. Many of the results in 
this chapter will already be familiar to you from other courses; they 
are included here for completeness. Because this chapter is not about 
linear algebra, your instructor may go through it rapidly. You may not 
be asked to scrutinize all the proofs. Make sure, however, that you 
at least read and understand the statements of all the results in this 
chapter—they will be used in the rest of the book. 


Recall that F denotes R or C. 
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When necessary, use 
the obvious arithmetic 
with -oo. For example, 
-oo < m and 
-oo + m = -oo for 
every integer m. The 0 
polynomial is declared 
to have degree - oo so 
that exceptions are not 
needed for various 
reasonable results. For 
example, the degree of 
pq equals the degree of 
p plus the degree of q 
even if p = 0. 


Degree 

Recall that a function p : F — F is called a polynomial with coeffi¬ 
cients in F if there exist ao,..., d m G F such that 

p(z) = ao + a\z + a 2 Z 2 + ■ ■ ■ + a m z m 

for all z e F. If p can be written in the form above with a m f 0, then we 
say that p has degree m. If all the coefficients ao,..., a m equal 0, then 
we say that p has degree - 00 . For all we know at this stage, a polynomial 
may have more than one degree because we have not yet proved that 
the coefficients in the equation above are uniquely determined by the 
function p. 

Recall that T( F) denotes the vector space of all polynomials with 
coefficients in F and that T m ( F) is the subspace of T{ F) consisting of 
the polynomials with coefficients in F and degree at most m. A number 
A e F is called a root of a polynomial p G T( F) if 

p( A) = 0. 

Roots play a crucial role in the study of polynomials. We begin by 
showing that A is a root of p if and only if p is a polynomial multiple 
of z - A. 

4.1 Proposition: Suppose p G T( F) is a polynomial with degree 
m > I. Let A e F. Then A is a root of p if and only if there is a 
polynomial q g T( F) with degree m - 1 such that 

4.2 p(z) = (z - \)q(z) 
for all z G F. 

Proof: One direction is obvious. Namely, suppose there is a poly¬ 
nomial q G T{ F) such that 4.2 holds. Then 

pi A) = (A-A)q(A) = 0, 

and hence A is a root of p, as desired. 

To prove the other direction, suppose that A e F is a root of p. Let 
ao,..., a m e F be such that a m f 0 and 


p(z) = do + d\Z + d2Z 2 + ■ ■ ■ + d m z m 
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for all z G F. Because p( A) = 0, we have 


0 — CLq + CLl A + U^A 2 + ■ ■ ■ + Cl-yyi A m . 

Subtracting the last two equations, we get 

p(z) = ai(z - A) + n, 2 (z 2 - A 2 ) + ■ ■ ■ + a m (z m - A m ) 
for all z G F. For each j = 2,..., m, we can write 
z j - A j = (z - A)qj-i(z) 

for all z G F, where qj-i is a polynomial with degree j - 1 (specifically, 
take qj-i(z) = z- 7-1 + z- 7-2 A + ■ ■ ■ + zA- 7-2 + A- 7-1 ). Thus 

p(z) = (z - A) (a! + a 2 qi(z) + ■ ■ ■ + a m q m -i(z)) 

q(z) 

for all z G F. Clearly q is a polynomial with degree m- 1, as desired. ■ 

Now we can prove that polynomials do not have too many roots. 

4.3 Corollary: Suppose p G ?(F) is a polynomial with degree m > 0. 
Then p has at most m distinct roots in F. 

Proof: If m = 0, then p(z) = clq 0 and so p has no roots. If 
m = 1, then p(z) = clq + aiz, with a\ =/= 0, and p has exactly one 
root, namely, - ao/a\. Now suppose m > 1. We use induction on m, 
assuming that every polynomial with degree m- 1 has at most m - 1 
distinct roots. If p has no roots in F, then we are done. If p has a root 
A G F, then by 4.1 there is a polynomial q with degree m - 1 such that 

p(z) = (z - A)q(z) 

for all z G F. The equation above shows that if p(z) = 0, then either 
z = A or q(z) = 0. In other words, the roots of p consist of A and the 
roots of q. By our induction hypothesis, q has at most m - 1 distinct 
roots in F. Thus p has at most m distinct roots in F. ■ 

The next result states that if a polynomial is identically 0, then all 
its coefficients must be 0. 
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Think of 4.6 as giving 
the remainder r when 
q is divided by p. 


4.4 Corollary: Suppose ao,..., a m G F. If 

ao + a\z + a 2 Z 2 + ■ ■ ■ + a m z m = 0 
for all z gF, then ao = ■ ■ ■ = a TO = 0. 

Proof: Suppose ao + aiz + a 2 Z 2 + ■ ■ ■ +a m z m equals 0 for all zeF, 
By 4.3, no nonnegative integer can be the degree of this polynomial. 
Thus all the coefficients equal 0. ■ 

The corollary above implies that (l,z, ..., z m ) is linearly indepen¬ 
dent in T{ F) for every nonnegative integer m . We had noted this earlier 
(in Chapter 2), but now we have a complete proof. This linear indepen¬ 
dence implies that each polynomial can be represented in only one way 
as a linear combination of functions of the form z J . In particular, the 
degree of a polynomial is unique. 

If p and q are nonnegative integers, with p f 0, then there exist 
nonnegative integers s and r such that 

q = sp + r. 

and r < p. Think of dividing q by p, getting s with remainder r. Our 
next task is to prove an analogous result for polynomials. 

Let degp denote the degree of a polynomial p. The next result is 
often called the division algorithm, though as stated here it is not really 
an algorithm, just a useful lemma. 

4.5 Division Algorithm: Suppose p,q e T( F), with p f 0. Then 
there exist polynomials s,r G T(F) such that 

4.6 q = sp + r 
and degr < degp. 

Proof: Choose s e T(F) such that q - sp has degree as small as 
possible. Let r = q - sp. Thus 4.6 holds, and all that remains is to 
show that deg r < deg p. Suppose that deg r > deg p. If c G F and j is 
a nonnegative integer, then 

q - (5 + cz j )p = r - cz^p. 

Choose j and c so that the polynomial on the right side of this equation 
has degree less than deg r (specifically, take j = deg r - deg p and then 
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choose c so that the coefficients of z degr in r and in cz->p are equal). 
This contradicts our choice of s as the polynomial that produces the 
smallest degree for expressions of the form q - sp, completing the 
proof. ■ 

Complex Coefficients 

So far we have been handling polynomials with complex coefficients 
and polynomials with real coefficients simultaneously through our con¬ 
vention that F denotes R or C. Now we will see some differences be¬ 
tween these two cases. In this section we treat polynomials with com¬ 
plex coefficients. In the next section we will use our results about poly¬ 
nomials with complex coefficients to prove corresponding results for 
polynomials with real coefficients. 

Though this chapter contains no linear algebra, the results so far 
have nonetheless been proved using algebra. The next result, though 
called the fundamental theorem of algebra, requires analysis for its 
proof. The short proof presented here uses tools from complex anal¬ 
ysis. If you have not had a course in complex analysis, this proof will 
almost certainly be meaningless to you. In that case, just accept the 
fundamental theorem of algebra as something that we need to use but 
whose proof requires more advanced tools that you may learn in later 
courses. 

4.7 Fundamental Theorem of Algebra: Every nonconstant polyno¬ 
mial with complex coefficients has a root. 

Proof: Let p be a nonconstant polynomial with complex coeffi¬ 
cients. Suppose that p has no roots. Then 1/p is an analytic function 
on C. Furthermore, p(z) — oo as z -» oo, which implies that 1/p — 0 as 
z — oo. Thus 1/p is abounded analytic function on C. By Liouville’s the¬ 
orem, any such function must be constant. But if 1 / p is constant, then 
p is constant, contradicting our assumption that p is nonconstant. ■ 

The fundamental theorem of algebra leads to the following factor¬ 
ization result for polynomials with complex coefficients. Note that 
in this factorization, the numbers Ai,...,A m are precisely the roots 
of p, for these are the only values of z for which the right side of 4.9 
equals 0. 


This is an existence 
theorem. The quadratic 
formula gives the roots 
explicitly for 
polynomials of 
degree 2. Similar but 
more complicated 
formulas exist for 
polynomials of degree 
3 and 4. No such 
formulas exist for 
polynomials of degree 
5 and above. 
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4.8 Corollary: If p e T( C) is a nonconstant polynomial, then p 
has a unique factorization (except for the order of the factors) of the 
form 

4.9 p(z) = c(z - Ai)... (z - A m ), 
where c, Ai,..., A m £C. 

Proof: Let p € JhC) and let m denote the degree of p. We will use 
induction on m. If m = 1, then clearly the desired factorization exists 
and is unique. So assume that m > 1 and that the desired factorization 
exists and is unique for all polynomials of degree m - 1 . 

First we will show that the desired factorization of p exists. By the 
fundamental theorem of algebra (4.7), p has a root A. By 4.1, there is a 
polynomial q with degree m - 1 such that 

p(z) = (z- A )q(z) 

for all z e C. Our induction hypothesis implies that q has the desired 
factorization, which when plugged into the equation above gives the 
desired factorization of p. 

Now we turn to the question of uniqueness. Clearly c is uniquely 
determined by 4.9—it must equal the coefficient of z m in p. So we need 
only show that except for the order, there is only one way to choose 
A i,..., A m . If 


(z — Ai)... (z — A m ) = (z - Ti) ... (z - T m ) 


for all z e C, then because the left side of the equation above equals 0 
when z = Ai, one of the t’s on the right side must equal Ai. Relabeling, 
we can assume that ti = Ai. Now for z f A\, we can divide both sides 
of the equation above by z - Ai, getting 


(z - A 2 ) . . . (z - A m ) = (z - T 2 ) . . . (z - Tm) 


for all z e C except possibly z = Ai. Actually the equation above 
must hold for all z e C because otherwise by subtracting the right side 
from the left side we would get a nonzero polynomial that has infinitely 
many roots. The equation above and our induction hypothesis imply 
that except for the order, the A’s are the same as the t’s, completing 
the proof of the uniqueness. ■ 
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r ReaC Coefficients 

Before discussing polynomials with real coefficients, we need to 
learn a bit more about the complex numbers. 

Suppose z = a + bi, where a and b are real numbers. Then a is 
called the real part of z, denoted Rez, and b is called the imaginary 
part of z, denoted Imz. Thus for every complex number z, we have 

z = Rez + (Imz)t. 

The complex conjugate ofzeC, denoted z, is defined by 
z = Rez - (Imz)t. 

For example, 2 + 3i = 2 - 3i. 

The absolute value of a complex number z, denoted |z|, is defined 
by 

\z\ = ^(Rez) 2 + (Imz) 2 . 

For example, |1 + 2i| = v 5. Note that \z\ is always a nonnegative 
number. 

You should verify that the real and imaginary parts, absolute value, 
and complex conjugate have the following properties: 

additivity of real part 

Ref w + z) = Re w + Re z for all w, z G C; 

additivity of imaginary part 

Imfw + z) = Im w + Imz for all w, zeC; 

sum of z and z 

z + z = 2 Re z for all z £ C; 

difference of z and z 

z - z = 2(Imz)i for all zeC; 

product of z and z 

zz = |z | 2 for all zeC; 

additivity of complex conjugate 

w + z = w + z for all w, zeC; 

multiplicativity of complex conjugate 

wz = wz for all w, zeC; 


Note that z = z if and 
only if z is a real 
number. 
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A polynomial with real 
coefficients may have 
no real roots. For 
example, the 
polynomial 1 + x 2 has 
no real roots. The 
failure of the 
fundamental theorem 
of algebra for R 
accounts for the 
differences between 
operators on real and 
complex vector spaces, 
as we will see in later 
chapters. 


Think about the 
connection between the 
quadratic formula and 
this proposition. 


conjugate of conjugate 

z = z for all z G C; 

multiplicativity of absolute value 

\wz\ = \w\ \z\ for all w ,z G C. 

In the next result, we need to think of a polynomial with real coef¬ 
ficients as an element of T(C). This makes sense because every real 
number is also a complex number. 

4.10 Proposition: Suppose p is a polynomial with real coefficients. 
If A E C is a root of p, then so is A. 

Proof: Let 

p(z) = do + a\z + ■ ■ ■ + a m z m , 

where ao, ..., a m are real numbers. Suppose A E C is a root of p. Then 
clq + aiA + ■ ■ ■ + a m A m — 0. 

Take the complex conjugate of both sides of this equation, obtaining 
clq + ci \A + ■ ■ ■ + a m A m — 0, 

where we have used some of the basic properties of complex conjuga¬ 
tion listed earlier. The equation above shows that A is a root of p. m 

We want to prove a factorization theorem for polynomials with real 
coefficients. To do this, we begin by characterizing the polynomials 
with real coefficients and degree 2 that can be written as the product 
of two polynomials with real coefficients and degree 1. 

4.11 Proposition: Let a, G R. Then there is a polynomial factor¬ 
ization of the form 

4.12 x 2 + ot.x + = (x - Ai)(x — A 2 ), 

with Ai, At e R, if and only if a 2 > 4/5. 

Proof: Notice that 

x 2 + ax + 1 5 = (x + ^) 2 + (/> - ^p). 


4.13 
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First suppose that a 2 < 4p. Then clearly the right side of the 
equation above is positive for every x e R, and hence the polynomial 
x 2 + ax + P has no real roots. Thus no factorization of the form 4.12, 
with Ai, At e R, can exist. 

Conversely, now suppose that a 2 > 4B. Thus there is a real number 

2 

c such that c 2 = ^— ft. From 4.13, we have 

x 2 + ax + P = (x + — ) 2 - c 2 

= (x + - + c)(x + — - c), 

which gives the desired factorization. ■ 

In the following theorem, each term of the form x 2 + ajx + Pj, with 
aj 2 < 4 Pj, cannot be factored into the product of two polynomials with 
real coefficients and degree 1 (by 4.11). Note that in the factorization 
below, the numbers Ai,..., A m are precisely the real roots of p, for these 
are the only real values of x for which the right side of the equation 
below equals 0. 

4.14 Theorem: If p e ?(R) is a nonconstant polynomial, then p 
has a unique factorization (except for the order of the factors) of the 
form 

p(x) = c(x - Ai)... (x - A m )(x 2 + a\x + pi)... (x 2 + a M x + p M ), 

where c, Ai,..., A m G R and (oil, Pi),..., (aM, Pm) e R 2 with a / 2 < 4 Pj 
for each j. 

Proof: Let p g PfR) be a nonconstant polynomial. We can think 
of p as an element of T( C) (because every real number is a complex 
number). The idea of the proof is to use the factorization 4.8 of p as a 
polynomial with complex coefficients. Complex but nonreal roots of p 
come in pairs; see 4.10. Thus if the factorization of p as an element 
of T( C) includes terms of the form (x - A) with A a nomeal complex 
number, then (x - A) is also a term in the factorization. Combining 
these two terms, we get a quadratic term of the required form. 

The idea sketched in the paragraph above almost provides a proof 
of the existence of our desired factorization. However, we need to 
be careful about one point. Suppose A is a nonreal complex number 


Here either m or M 
may equal 0. 
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and (x - A) is a term in the factorization of p as an element of T( C). 
We are guaranteed by 4.10 that (x - A) also appears as a term in the 
factorization, but 4.10 does not state that these two factors appear 
the same number of times, as needed to make the idea above work. 
However, all is well. We can write 

p(x) = (x - A)(x - A )q(x) 

= (x 2 - 2(ReA)x + |A| 2 )q(x) 


Here we are not 
dividing by 0 because 
the roots of 
x 2 - 2(Re A)x + |A| 2 
are A and A, neither of 
which is real. 


for some polynomial q G T( C) with degree two less than the degree 
of p. If we can prove that q has real coefficients, then, by using induc¬ 
tion on the degree of p, we can conclude that (x - A) appears in the 
factorization of p exactly as many times as (x - A). 

To prove that q has real coefficients, we solve the equation above 
for q, getting 

a(x) = _ eM _ 

q x 2 -2(ReA)x+|A| 2 

for all xeR, The equation above implies that q(x) G R for all xeR, 
Writing 

q(x) = a o 4 - a\x + ■ ■ ■ + a n -2X n ~ 2 , 


where ao,..., 2 e C, we thus have 


0 = Imq(x) = (Imflo) + (Imai)x + ■ ■ ■ + (Ima„_ 2 )x" 2 


for all x e R. This implies that Imao,..., Ima n _2 all equal 0 (by 4.4). 
Thus all the coefficients of q are real, as desired, and hence the desired 
factorization exists. 

Now we turn to the question of uniqueness of our factorization. A 
factor of p of the formx 2 + ax + /3 with a 2 < 4/1 can be uniquely written 
as (x - A)(x - A) with A e C. A moment’s thought shows that two 
different factorizations of p as an element of P(R) would lead to two 
different factorizations of p as an element of T( C), contradicting 4.8. ■ 
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"Exercises 

1. Suppose m and n are positive integers with m < n. Prove that 
there exists a polynomial p G P n (F) with exactly m distinct 
roots. 

2. Suppose that Zi,...,z m +i are distinct elements of F and that 
Wi,..., w m+ 1 e F. Prove that there exists a unique polynomial 
p G T m ( F) such that 

p(Zj) = Wj 

for j = 1,... ,m + 1. 

3. Prove that if p,q G T( F), with p + 0, then there exist unique 
polynomials s, r G T(lc) such that 

q = sp + r 

and degr < cleg p. In other words, add a uniqueness statement 
to the division algorithm (4.5). 

4. Suppose p G T( C) has degree m. Prove that p has m distinct 
roots if and only if p and its derivative p' have no roots in com¬ 
mon. 

5. Prove that every polynomial with odd degree and real coefficients 
has a real root. 



Chapter 5 


'Eigenvalues and Eigenvectors 


In Chapter 3 we studied linear maps from one vector space to an¬ 
other vector space. Now we begin our investigation of linear maps from 
a vector space to itself. Their study constitutes the deepest and most 
important part of linear algebra. Most of the key results in this area 
do not hold for infinite-dimensional vector spaces, so we work only on 
finite-dimensional vector spaces. To avoid trivialities we also want to 
eliminate the vector space {0} from consideration. Thus we make the 
following assumption: 
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The most famous 
unsolved problem in 
functional analysis is 
called the invariant 
subspace problem. It 
deals with invariant 
subspaces of operators 
on infinite-dimensional 
vector spaces. 


Invariant Subspaces 

In this chapter we develop the tools that will help us understand the 
structure of operators. Recall that an operator is a linear map from a 
vector space to itself. Recall also that we denote the set of operators 
on V by £(V); in other words, L(V) = L{V, V). 

Let’s see how we might better understand what an operator looks 
like. Suppose T G L(V). If we have a direct sum decomposition 

5.1 V = Ui © ■ ■ ■ ® U m , 

where each Uj is a proper subspace of V, then to understand the be¬ 
havior of T, we need only understand the behavior of each T\uj\ here 
T\uj denotes the restriction of T to the smaller domain Uj. Dealing 
with T\uj should be easier than dealing with T because Uj is a smaller 
vector space than V. However, if we intend to apply tools useful in the 
study of operators (such as taking powers), then we have a problem: 
T | uj may not map Uj into itself; in other words, T\uj may not be an 
operator on Uj. Thus we are led to consider only decompositions of 
the form 5.1 where T maps each Uj into itself. 

The notion of a subspace that gets mapped into itself is sufficiently 
important to deserve a name. Thus, for T G £(V) and U a subspace 
of V, we say that U is invariant under T if u e U implies Tu e U. 
In other words, U is invariant under T if T\u is an operator on U. For 
example, if T is the operator of differentiation on JMR), then TffiR) 
(which is a subspace of TViR)) is invariant under T because the deriva¬ 
tive of any polynomial of degree at most 4 is also a polynomial with 
degree at most 4. 

Let’s look at some easy examples of invariant subspaces. Suppose 
T G £(V). Clearly {0} is invariant under T. Also, the whole space V is 
obviously invariant under T. Must T have any invariant subspaces other 
than {0} and VI Later we will see that this question has an affirmative 
answer for operators on complex vector spaces with dimension greater 
than 1 and also for operators on real vector spaces with dimension 
greater than 2. 

If T G £(V), then null T is invariant under T (proof: if u G null T, 
then Tu = 0, and hence Tu G null T). Also, range T is invariant under T 
(proof: if u e range T, then Tu is also in range T, by the definition of 
range). Although null T and range T are invariant under T, they do not 
necessarily provide easy answers to the question about the existence 
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of invariant subspaces other than {0} and V because null T may equal 
{0} and range T may equal V (this happens when T is invertible). 

We will return later to a deeper study of invariant subspaces. Now 
we turn to an investigation of the simplest possible nontrivial invariant 
subspaces—invariant subspaces with dimension 1. 

How does an operator behave on an invariant subspace of dimen¬ 
sion 1? Subspaces of V of dimension 1 are easy to describe. Take any 
nonzero vector u e V and let U equal the set of all scalar multiples 
of u\ 

5.2 U = {au : a G F}. 

Then U is a one-dimensional subspace of V, and every one-dimensional 
subspace of V is of this form. If u e V and the subspace U defined 
by 5.2 is invariant under T G £<V), then Tu must be in U, and hence 
there must be a scalar A G F such that Tu = Au. Conversely, if u 
is a nonzero vector in V such that Tu = Au for some A e F, then the 
subspace U defined by 5.2 is a one-dimensional subspace of V invariant 
under T. 

The equation 


5.3 Tu = Au, 

which we have just seen is intimately connected with one-dimensional 
invariant subspaces, is important enough that the vectors u and scalars 
A satisfying it are given special names. Specifically, a scalar A G F 
is called an eigenvalue of T G £(V) if there exists a nonzero vector 
u G V such that Tu = Au. We must require u to be nonzero because 
with u = 0 every scalar A G F satisfies 5.3. The comments above show 
that T has a one-dimensional invariant subspace if and only if T has 
an eigenvalue. 

The equation Tu = Au is equivalent to (T - AI)u = 0, so A is an 
eigenvalue of T if and only if T - AI is not injective. By 3.21, A is an 
eigenvalue of T if and only if T - AI is not invertible, and this happens 
if and only if T - AI is not surjective. 

Suppose T G £(V) and A G F is an eigenvalue of T. A vector u G V 
is called an eigenvector of T (corresponding to A) if Tu = Au. Because 
5.3 is equivalent to (T - AI)u = 0, we see that the set of eigenvectors 
of T corresponding to A equals null(T - AI). In particular, the set of 
eigenvectors of T corresponding to A is a subspace of V. 


These subspaces are 
loosely connected to 
the subject of Herbert 
Marcuse’s well-known 
book One-Dimensional 
Man. 


The regrettable word 
eigenvalue is 
half-German, 
half-English. The 
German adjective eigen 
means own in the sense 
of characterizing some 
intrinsic property. 

Some mathematicians 
use the term 
characteristic value 
instead of eigenvalue. 
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Some texts define 
eigenvectors as we 
have, except that 0 is 
declared not to be an 
eigenvector. With the 
definition used here, 
the set of eigenvectors 
corresponding to a 
fixed eigenvalue is a 
subspace. 


Let’s look at some examples of eigenvalues and eigenvectors. If 
a e F, then al has only one eigenvalue, namely, a, and every vector is 
an eigenvector for this eigenvalue. 

For a more complicated example, consider the operator T G £(F 2 ) 
defined by 

5.4 T(w,z) = (~z,w). 

If F = R, then this operator has a nice geometric interpretation: T is 
just a counterclockwise rotation by 90° about the origin in R 2 . An 
operator has an eigenvalue if and only if there exists a nonzero vector 
in its domain that gets sent by the operator to a scalar multiple of itself. 
The rotation of a nonzero vector in R 2 obviously never equals a scalar 
multiple of itself. Conclusion: if F = R, the operator T defined by 5.4 
has no eigenvalues. However, if F = C, the story changes. To find 
eigenvalues of T, we must find the scalars A such that 


T(w,z) = A (w,z) 


has some solution other than w = z = 0. For T defined by 5.4, the 
equation above is equivalent to the simultaneous equations 

5.5 -z = Aw, w = A z. 


Substituting the value for w given by the second equation into the first 
equation gives 

-z = A 2 z. 

Now z cannot equal 0 (otherwise 5.5 implies that w = 0; we are looking 
for solutions to 5.5 where (w,z) is not the 0 vector), so the equation 
above leads to the equation 


-1 = A 2 . 

The solutions to this equation are A = i or A = -i. You should be 
able to verify easily that i and -i are eigenvalues of T. Indeed, the 
eigenvectors corresponding to the eigenvalue i are the vectors of the 
form (w, -wi), with w G C, and the eigenvectors corresponding to the 
eigenvalue -i are the vectors of the form (w, wi), with w e C. 

Now we show that nonzero eigenvectors corresponding to distinct 
eigenvalues are linearly independent. 
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5.6 Theorem: Let T G £(V). Suppose Ai,..., A m are distinct eigen¬ 
values of T and Vi,..., v m are corresponding nonzero eigenvectors. 
Then (vi,. .., v m ) is linearly independent. 

Proof: Suppose (v\,... ,v m ) is linearly dependent. Let k be the 
smallest positive integer such that 

5.7 v fe e span(vi,...,v k _i); 

the existence of k with this property follows from the linear dependence 
lemma (2.4). Thus there exist a \,..., Uk_i e F such that 

5.8 Vk = fliVi + ■ ■ ■ + dk-iVk-t? 

Apply T to both sides of this equation, getting 


A fcVfc = aiAiVi + ■ ■ ■ + a.fe-iAk-iVfe-1. 

Multiply both sides of 5.8 by Ak and then subtract the equation above, 
getting 

0 = ai(A fc - Ai)vi + ■ ■ ■ + Uk_i(Ak - A k _i)v fc _i. 

Because we chose k to be the smallest positive integer satisfying 5.7, 
(Vi,..., Vfc-i) is linearly independent. Thus the equation above implies 
that all the a’s are 0 (recall that Ak is not equal to any of Ai,..., Ak-i). 
However, this means that Vk equals 0 (see 5.8), contradicting our hy¬ 
pothesis that all the v’s are nonzero. Therefore our assumption that 
(Vi,..., v m ) is linearly dependent must have been false. ■ 

The corollary below states that an operator cannot have more dis¬ 
tinct eigenvalues than the dimension of the vector space on which it 
acts. 

5.9 Corollary: Each operator on V has at most dimV distinct eigen¬ 
values. 

Proof: Let T e L(V). Suppose that Ai,..., A m are distinct eigenval¬ 
ues of T. Let vi,..., v m be corresponding nonzero eigenvectors. The 
last theorem implies that (v\,... ,v m ) is linearly independent. Thus 
m < dim V (see 2.6), as desired. ■ 
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TotynomiaCs JAppdecC to Operators 


The main reason that a richer theory exists for operators (which 
map a vector space into itself) than for linear maps is that operators 
can be raised to powers. In this section we define that notion and the 
key concept of applying a polynomial to an operator. 

If T G L(V), then TT makes sense and is also in £(V). We usually 
write T 2 instead of TT. More generally, if m is a positive integer, then 
T m is defined by 

pm _ j 1 p 

m times 

For convenience we define T° to be the identity operator / on V. 

Recall from Chapter 3 that if T is an invertible operator, then the 
inverse of T is denoted by T~ l . If m is a positive integer, then we define 
T- m to be (T" 1 )™ 

You should verify that if T is an operator, then 


pm pn __ pm+n 


(T 


m\n 


__ pT 


where m and n are allowed to be arbitrary integers if T is invertible 
and nonnegative integers if T is not invertible. 

If T G L(V) and p G T( F) is a polynomial given by 

p(z) = ao + a\z + d 2 Z 2 + ■ ■ ■ + a m z m 

for zeF, then p(T) is the operator defined by 

p(T) = aol + a\T + n^T 2 + ■ ■ ■ + a m T m . 

For example, if p is the polynomial defined by piz) = z 2 for zeF, then 
p(T) = T 2 . This is a new use of the symbol p because we are applying 
it to operators, not just elements of F. If we fix an operator T G £(V), 
then the function from T( F) to £{V) given by p — p(T) is linear, as 
you should verify. 

If p and q are polynomials with coefficients in F, then pq is the 
polynomial defined by 


(pq)(z) = p(z)q(z) 

for zeF, You should verify that we have the following nice multiplica¬ 
tive property: if T G £(V), then 
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(pq)(T) = p(T)q(T) 

for all polynomials p and q with coefficients in F. Note that any two 
polynomials in T commute, meaning that p(T)q(T) = q(T)p(T), be¬ 
cause 

p(T)q(T) = ( pq)(T) = ( qp)(T) = q(T)p(T). 

Vpjier-Triangu(ar Matrices 

Now we come to one of the central results about operators on com¬ 
plex vector spaces. 

5.10 Theorem: Every operator on a finite-dimensional, nonzero, 
complex vector space has an eigenvalue. 

Proof: Suppose V is a complex vector space with dimension n > 0 
and T G £(V). Choose v G V with v f 0. Then 

( v , Tv, T 2 v, ..., T n v) 

cannot be linearly independent because V has dimension n and we have 
n + 1 vectors. Thus there exist complex numbers ao, ...,a n , not all 0, 
such that 

0 = clqv + a\Tv + ■ ■ ■ + a n T n v. 

Let m be the largest index such that a m f 0. Because v ^ 0, the 
coefficients cannot all be 0, so 0 < m < n. Make the a’s 

the coefficients of a polynomial, which can be written in factored form 
(see 4.8) as 


ao + a\z + ■ ■ ■ + a n z n = c(z - Ai)... (z - A m ), 

where c is a nonzero complex number, each Aj G C, and the equation 
holds for all z G C. We then have 


0 = aov + a\Tv + ■ ■ ■ + d n T n v 
= (ao/ + d\T + ■ ■ ■ + d n T n )v 
= c(T-Ai/)...(T-A w /)v, 


which means that T - A,/ is not injective for at least one j. In other 
words, T has an eigenvalue. ■ 


Compare the simple 
proof of this theorem 
given here with the 
standard proof using 
determinants. With the 
standard proof, first 
the difficult concept of 
determinants must be 
defined, then an 
operator with 0 
determinant must be 
shown to be not 
invertible, then the 
characteristic 
polynomial needs to be 
defined, and by the 
time the proof of this 
theorem is reached, no 
insight remains about 
why it is true. 
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Recall that in Chapter 3 we discussed the matrix of a linear map 
from one vector space to another vector space. This matrix depended 
on a choice of a basis for each of the two vector spaces. Now that we are 
studying operators, which map a vector space to itself, we need only 
one basis. In addition, now our matrices will be square arrays, rather 
than the more general rectangular arrays that we considered earlier. 
Specifically, let T G £(V). Suppose (vi,...,v M ) is a basis of V. For 
each k = 1,..., n, we can write 


TVk — CL l.fcVi + ■ ■ ■ + U n} kVn, 

The k th column of the where aj t k e F for j = 1,..., n. The n-by-n matrix 
matrix is formed from 
the coefficients used to 

write Tvk as a linear 5.1 1 
combination of the v's. 


di,i ■■■ dl,n 

CLn, 1 ■ ■ ■ CLn.n 


is called the matrix of T with respect to the basis (vi,..., v M ); we de¬ 
note it by M(T, (Vi,..., v n )) or just by M(T) if the basis (vi,. .., v n ) 
is clear from the context (for example, if only one basis is in sight). 

If T is an operator on F n and no basis is specified, you should assume 
that the basis in question is the standard one (where the j th basis vector 
is 1 in the j th slot and 0 in all the other slots). You can then think of 
the j th column of M(T) as T applied to the j lh basis vector. 

A central goal of linear algebra is to show that given an operator 
T G £(V), there exists a basis of V with respect to which T has a 
reasonably simple matrix. To make this vague formulation (“reasonably 
simple” is not precise language) a bit more concrete, we might try to 
make M(T) have many 0’s. 

If V is a complex vector space, then we already know enough to 
show that there is a basis of V with respect to which the matrix of T 
has 0’s everywhere in the first column, except possibly the first entry. 
In other words, there is a basis of V with respect to which the matrix 
of T looks like 

We often use * to 
denote matrix entries 
that we do not know 
about or that are 
irrelevant to the 

questions being here the * denotes the entries in all the columns other than the first 

discussed column. To prove this, let A be an eigenvalue of T (one exists by 5.10) 


A 

0 * 

0 
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and let v be a corresponding nonzero eigenvector. Extend (v) to a 
basis of V. Then the matrix of T with respect to this basis has the form 
above. Soon we will see that we can choose a basis of V with respect to 
which the matrix of T has even more 0’s. 

The diagonal of a square matrix consists of the entries along the 
straight line from the upper left corner to the bottom right corner. 
For example, the diagonal of the matrix 5.11 consists of the entries 
0-1,1,0-2,2, ■■■, O n , n - 

A matrix is called upper triangular if all the entries below the di¬ 
agonal equal 0. For example, the 4-by-4 matrix 

" 6 2 7 5 " 

0 6 13 
0 0 7 9 
0 0 0 8 

is upper triangular. Typically we represent an upper-triangular matrix 
in the form 

Ai * 

) 

0 A n 

the 0 in the matrix above indicates that all entries below the diagonal 
in this n-by-n matrix equal 0. Upper-triangular matrices can be consid¬ 
ered reasonably simple—for n large, an n-by-n upper-triangular matrix 
has almost half its entries equal to 0. 

The following proposition demonstrates a useful connection be¬ 
tween upper-triangular matrices and invariant subspaces. 

5.12 Proposition: Suppose T e £(V) and (vi,...,v M ) is a basis 
of V. Then the following are equivalent: 

(a) the matrix of T with respect to (vi,..., v n ) is upper triangular; 

(b) Tvk e span(vi, ..., Vk) for each k = 1 ,..., n; 

(c) span(vi,..., Vfc) is invariant under T for each k = 1,..., n. 

Proof: The equivalence of (a) and (b) follows easily from the def¬ 
initions and a moment’s thought. Obviously (c) implies (b). Thus to 
complete the proof, we need only prove that (b) implies (c). So suppose 
that (b) holds. Fix fee n}. From (b), we know that 
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Tv i e span(vi) c span(vi,...,v fc ); 
Tv 2 e span(vi,v 2 ) c span(vi,...,v k ); 


This theorem does not 
hold on real vector 
spaces because the first 
vector in a basis with 
respect to which an 
operator has an 
upper-triangular matrix 
must be an eigenvector 
of the operator. Thus if 
an operator on a real 
vector space has no 
eigenvalues (we have 
seen an example on 
R 2 ), then there is no 
basis with respect to 
which the operator has 
an upper-triangular 
matrix. 


Tv k e span(vi,...,v k ). 

Thus if v is a linear combination of (vi,...,v k ), then 

Tv e span(vi,...,v fe ). 

In other words, span(vi,..., v k ) is invariant under T, completing the 
proof. ■ 

Now we can show that for each operator on a complex vector space, 
there is a basis of the vector space with respect to which the matrix 
of the operator has only 0’s below the diagonal. In Chapter 8 we will 
improve even this result. 

5.1 3 Theorem: Suppose V is a complex vector space and T e £(V). 
Then T has an upper-triangular matrix with respect to some basis of V. 

Proof: We will use induction on the dimension of V. Clearly the 
desired result holds if dim V = 1. 

Suppose now that dim V > 1 and the desired result holds for all 
complex vector spaces whose dimension is less than the dimension 
of V. Let A be any eigenvalue of T (5.10 guarantees that T has an 
eigenvalue). Let 

U = range(T - A I). 

Because T-AI is not surjective (see 3.21), dim U < dim V. Furthermore, 
U is invariant under T. To prove this, suppose it e U. Then 

Tu = (T - A I)u + Ait. 

Obviously (T - A I)u e U (from the definition of U) and Au e U. Thus 
the equation above shows that Tu G U. Hence U is invariant under T, 
as claimed. 

Thus T1 1 ; is an operator on U. By our induction hypothesis, there 
is a basis (u \,..., u m ) of U with respect to which T\jj has an upper- 
triangular matrix. Thus for each j we have (using 5.12) 


5.14 


Tuj = (T\u)(Uj) G span(u.i,..., Uj). 
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Extend (u\,... ,u m ) to a basis (ui,... ,u m ,Vi,... ,v n ) of V. For 
each k, we have 

Tv k = (T - A I)Vk + Avfe. 

The definition of U shows that (T - A I)v k e U = span(tti,... ,u m ). 
Thus the equation above shows that 

5.1 5 Tv k e span(tti,...,n TO ,Vi,...,v k ). 

From 5T4 and 5.15, we conclude (using 5.12) that T has an upper- 
triangular matrix with respect to the basis (iti..V n ). ■ 

How does one determine from looking at the matrix of an operator 
whether the operator is invertible? If we are fortunate enough to have 
a basis with respect to which the matrix of the operator is upper tri¬ 
angular, then this problem becomes easy, as the following proposition 
shows. 

5.16 Proposition: Suppose T e L(V) has an upper-triangular matrix 
with respect to some basis of V. Then T is invertible if and only if all 
the entries on the diagonal of that upper-triangular matrix are nonzero. 

Proof: Suppose (vi,..., v n ) is a basis of V with respect to which 
T has an upper-triangular matrix 


5.17 


M(T, (vi,...,v n )) = 


Ai 


We need to prove that T is not invertible if and only if one of the A^’s 
equals 0. 

First we will prove that if one of the Afc’s equals 0, then T is not 
invertible. If Ai = 0, then Tvi = 0 (from 5.17) and hence T is not 
invertible, as desired. So suppose that 1 < k < n and A k = 0. Then, 
as can be seen from 5.17, T maps each of the vectors Vi,..., v k ~i into 
span(vi,..., Vfc-i). Because Afc = 0, the matrix representation 5.17 also 
implies that Tv k e span(vi,..., v k -i). Thus we can define a linear map 


5: span(vi,...,v fe ) - span(vi,...,v fe _i) 
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by Sv = Tv for v G span(vi,..., Vfc). In other words, S is just T 
restricted to span(vi,..., Vfc). 

Note that span(vi,..., Vfc) has dimension k and span(vi,..., Vfc-i) 
has dimension k - 1 (because (vi, ...,v n ) is linearly independent). Be¬ 
cause span(vi,..., Vfc) has a larger dimension than span(vi,..., Vfc_i), 
no linear map from span(vi,..., Vfc) to span(vi,..., Vfc_i) is injective 
(see 3.5). Thus there exists a nonzero vector v e span(vi,..., Vfc) such 
that Sv = 0. Hence Tv = 0, and thus T is not invertible, as desired. 

To prove the other direction, now suppose that T is not invertible. 
Thus T is not injective (see 3.21), and hence there exists a nonzero 
vector v e V such that Tv = 0. Because (vi,..., v n ) is a basis of V, we 
can write 

v = aiVi + ■ ■ ■ + UfcVfc, 

where ai ,..., a* e F and f 0 (represent v as a linear combination 
of (vi,..., v n ) and then choose k to be the largest index with a nonzero 
coefficient). Thus 

0 = Tv 

0 = TffiiVi + ■ ■ ■ + afcVfc) 

= ffiiTvi + ■ ■ ■ + tifc-iTVfc-i) + flfcTvfc. 

The last term in parentheses is in span(vi,..., Vfc_i) (because of the 
upper-triangular form of 5.17). Thus the last equation shows that 
afcTvfc G span(vi,..., Vfc_i). Multiplying by 1/Ufc, which is allowed 
because f 0, we conclude that Tvfc G span(vi,..., Vfc_i). Thus 
when Tvfc is written as a linear combination of the basis (vi,..., v n ), 
the coefficient of Vfc will be 0. In other words, Afc in 5.17 must be 0, 
completing the proof. ■ 


Powerful numeric 
techniques exist for 
finding good 
approximations to the 
eigenvalues of an 
operator from its 
matrix. 


Unfortunately no method exists for exactly computing the eigenval¬ 
ues of a typical operator from its matrix (with respect to an arbitrary 
basis). However, if we are fortunate enough to find a basis with re¬ 
spect to which the matrix of the operator is upper triangular, then the 
problem of computing the eigenvalues becomes trivial, as the following 
proposition shows. 

5.18 Proposition: Suppose T G £(V) has an upper-triangular matrix 
with respect to some basis of V. Then the eigenvalues of T consist 
precisely of the entries on the diagonal of that upper-triangular matrix. 
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Proof: Suppose (vi,..., v n ) is a basis of V with respect to which 
T has an upper-triangular matrix 


M(T, (vi,...,v n )) 



0 


* 

A n 


Let AeF. Then 





0 


A n A 


Hence T - AJ is not invertible if and only if A equals one of the A' s 
(see 5.16). In other words, A is an eigenvalue of T if and only if A 
equals one of the A^s, as desired. ■ 


Diagonal Matrices 

A diagonal matrix is a square matrix that is 0 everywhere except 
possibly along the diagonal. For example, 

“ 8 0 0 “ 

0 2 0 
0 0 5 

is a diagonal matrix. Obviously every diagonal matrix is upper triangu¬ 
lar, although in general a diagonal matrix has many more 0’s than an 
upper-triangular matrix. 

An operator T e £(V) has a diagonal matrix 

“ Ai 0 “ 

0 A n 

with respect to a basis (vi,..., v n ) of V if and only 

Tvi = AiVi 


Tv n — A n v n ; 
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this follows immediately from the definition of the matrix of an opera¬ 
tor with respect to a basis. Thus an operator T e £(V) has a diagonal 
matrix with respect to some basis of V if and only if V has a basis 
consisting of eigenvectors of T. 

If an operator has a diagonal matrix with respect to some basis, 
then the entries along the diagonal are precisely the eigenvalues of the 
operator; this follows from 5.18 (or you may want to find an easier 
proof that works only for diagonal matrices). 

Unfortunately not every operator has a diagonal matrix with respect 
to some basis. This sad state of affairs can arise even on complex vector 
spaces. For example, consider T e £(C 2 ) defined by 

5.19 T(w, z) = (z, 0). 

As you should verify, 0 is the only eigenvalue of this operator and 
the corresponding set of eigenvectors is the one-dimensional subspace 
{( w, 0 )gC 2 : w e C{. Thus there are not enough linearly independent 
eigenvectors of T to form a basis of the two-dimensional space C 2 . 
Hence T does not have a diagonal matrix with respect to any basis 
of C 2 . 

The next proposition shows that if an operator has as many distinct 
eigenvalues as the dimension of its domain, then the operator has a di¬ 
agonal matrix with respect to some operator. However, some operators 
with fewer eigenvalues also have diagonal matrices (in other words, the 
converse of the next proposition is not true). For example, the operator 
T defined on the three-dimensional space F 3 by 


T (zi, z 2 , z 3 ) = (4 zi,4z 2 , 5z 3 ) 

has only two eigenvalues (4 and 5), but this operator has a diagonal 
matrix with respect to the standard basis. 


Later we will find other 
conditions that imply 
that certain operators 
have a diagonal matrix 
with respect to some 
basis (see 7.9 and 7.13). 


5.20 Proposition: If T e £(V) has dim V distinct eigenvalues, then 
T has a diagonal matrix with respect to some basis of V. 

Proof: Suppose that T e £(V) has clim V 7 distinct eigenvalues 
Ai.A riim v. For each j, let vj e V be a nonzero eigenvector cor¬ 

responding to the eigenvalue A j. Because nonzero eigenvectors cor¬ 
responding to distinct eigenvalues are linearly independent (see 5.6), 
(vi,..., V dim v) is linearly independent. A linearly independent list of 
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dimV vectors in V is a basis of V (see 2.17); thus (Vi.V aim v) is a 

basis of V. With respect to this basis consisting of eigenvectors, T has 
a diagonal matrix. ■ 

We close this section with a proposition giving several conditions 
on an operator that are equivalent to its having a diagonal matrix with 
respect to some basis. 


5.21 Proposition: Suppose T © £(V). Let Ai,...,A m denote the 
distinct eigenvalues of T. Then the following are equivalent: 

(a) T has a diagonal matrix with respect to some basis of V; 

(b) V has a basis consisting of eigenvectors of T; 


For complex vector 
spaces, we will extend 
this list of equivalences 
later (see Exercises 16 
and 23 in Chapter 8). 


(c) there exist one-dimensional subspaces Ui,...,U n of V, each in¬ 
variant under T, such that 


V = Ui © ■ ■ ■ © U n \ 


(d) V = null(T - Ai I) © ■ ■ ■ © null(T - A m I); 

(e) dim V = dimnulHT - Ai7) + ■ ■ ■ + dimnull(T - A m J). 


Proof: We have already shown that (a) and (b) are equivalent. 

Suppose that (b) holds; thus V has a basis (vi ,... ,v n ) consisting of 
eigenvectors of T. For each j, let Uj = span(Vj). Obviously each Uj 
is a one-dimensional subspace of V that is invariant under T (because 
each Vj is an eigenvector of T). Because (vi,... ,v n ) is a basis of V, 
each vector in V can be written uniquely as a linear combination of 
(vi,..., v n ). In other words, each vector in V can be written uniquely 
as a sum Ui + ■ ■ ■ + u n , where each uj © Uj. Thus V = U\ © ■ ■ ■ © U n . 
Hence (b) implies (c). 

Suppose now that (c) holds; thus there are one-dimensional sub¬ 
spaces Ui,...,U n of V, each invariant under T, such that 

V = Ui © ■ ■ ■ © u n . 

For each j, let Vj be a nonzero vector in Uj. Then each Vj is an eigen¬ 
vector of T. Because each vector in V can be written uniquely as a sum 
u\ + ■ ■ ■ + u n , where each Uj © Uj (so each Uj is a scalar multiple of vj), 
we see that (vi,..., v M ) is a basis of V. Thus (c) implies (b). 
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At this stage of the proof we know that (a), (b), and (c) are all equiv¬ 
alent. We will finish the proof by showing that (b) implies (d), that (d) 
implies (e), and that (e) implies (b). 

Suppose that (b) holds; thus V has a basis consisting of eigenvectors 
of T. Thus every vector in V is a linear combination of eigenvectors 
of T. Hence 

5.22 V = nulKT - Ai I) + ■ ■ ■ + null(T - A m I). 

To show that the sum above is a direct sum, suppose that 

0 = 111 + ■ ■ ■ + Um , 

where each uj e null(T - Ay/). Because nonzero eigenvectors corre¬ 
sponding to distinct eigenvalues are linearly independent, this implies 
(apply 5.6 to the sum of the nonzero vectors on the right side of the 
equation above) that each uy equals 0. This implies (using 1.8) that the 
sum in 5.22 is a direct sum, completing the proof that (b) implies (d). 

That (d) implies (e) follows immediately from Exercise 17 in Chap¬ 
ter 2. 

Finally, suppose that (e) holds; thus 

5.23 dim V = dimnulKT - Ai/) + ■ ■ ■ + dimnulKT - A m I). 

Choose a basis of each null(T - Ay/) ; put all these bases together to 
form a list (vi,..., v M ) of eigenvectors of T, where n = dim V (by 5.23). 
To show that this list is linearly independent, suppose 

aiVi + ■ ■ ■ + a n v n = 0, 

where a\, ..., a n e F. For each j = 1,... ,m, let Uj denote the sum of 
all the terms iik^k such that Vk £ null(T - Ay/). Thus each uj is an 
eigenvector of T with eigenvalue Ay, and 

111 + ■ ■ ■ + Um = 0 * 

Because nonzero eigenvectors corresponding to distinct eigenvalues 
are linearly independent, this implies (apply 5.6 to the sum of the 
nonzero vectors on the left side of the equation above) that each uy 
equals 0. Because each uj is a sum of terms UfeVfc, where the Vu s 
were chosen to be a basis of null(T - Ay/), this implies that all the oVs 
equal 0. Thus (vi,..., v n ) is linearly independent and hence is a basis 
of V (by 2.17). Thus (e) implies (b), completing the proof. ■ 
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Invariant Suhspaces on r ReaC^Vector Spaces 

We know that every operator on a complex vector space has an eigen¬ 
value (see 5.10 for the precise statement). We have also seen an example 
showing that the analogous statement is false on real vector spaces. In 
other words, an operator on a nonzero real vector space may have no 
invariant subspaces of dimension 1. However, we now show that an 
invariant subspace of dimension 1 or 2 always exists. 

5.24 Theorem: Every operator on a finite-dimensional, nonzero, real 
vector space has an invariant subspace of dimension 1 or 2. 

Proof: Suppose V is a real vector space with dimension n > 0 and 
T G £(V). Choose v G V with v f 0. Then 

(v, Tv, T 2 v, ..., T n y) 

cannot be linearly independent because V has dimension n and we have 
n + 1 vectors. Thus there exist real numbers ao,..., a n , not all 0, such 
that 

0 = clqv + a\Tv + ■ ■ ■ + a n T n v. 

Make the a’s the coefficients of a polynomial, which can be written in 
factored form (see 4.14) as 


ao + a\x + ■ ■ ■ + a n x n 

= c(x - Ai) ... (x - A m )(x 2 + (Xix + Pi) ... (x 2 + a M x + fi M ), 

where c is a nonzero real number, each A j, aj, and ft j is real, m+M > 1, 
and the equation holds for all x e R. We then have 


Here either m or M 
might equal 0. 


0 = aov + d\Tv + ■ ■ ■ + a n T n v 
= (ao/ + a\T + ■ ■ ■ + a n T n )v 

= c(T - AO) ...(T- A m I)(T 2 + aiT + Pi I)... (T 2 + a M T + p M I)v, 


which means that T - A,/ is not injective for at least one j or that 
( T 2 + (XjT + Pjl) is not injective for at least one j. If T - AjI is not 
injective for at least one j, then T has an eigenvalue and hence a one¬ 
dimensional invariant subspace. Let’s consider the other possibility. In 
other words, suppose that (T 2 + ajT + Pjl) is not injective for some j. 
Thus there exists a nonzero vector u G V such that 
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Pu,w is often called the 
projection onto U with 
null space W. 


5.25 T 2 u + (XjTu + fijU = 0. 

We will complete the proof by showing that span (it, Tu), which clearly 
has dimension 1 or 2, is invariant under T. To do this, consider a typical 
element of span(u, Tu ) of the form au + bTu, where a, b e R. Then 

T(au + bTu) = aTu + bT 2 u 

= aTu - botjTu - bfijU, 

where the last equality comes from solving for T 2 u in 5.25. The equa¬ 
tion above shows that T(au + bTu) e span (u,Tu). Thus span (u,Tu) 
is invariant under T, as desired. ■ 

We will need one new piece of notation for the next proof. Suppose 
U and W are subspaces of V with 


V =u®w. 

Each vector v e V can be written uniquely in the form 

v = u + w, 

where u e U and w e W. With this representation, define Pu,w e £(V) 
by 

Pu,wv = u. 

You should verify that Pu,wv = v if and only if v e U. Interchanging 
the roles of U and W in the representation above, we have Pw,u v = w ■ 
Thus v = Pu,wv + Pw,u v f° r every v e V. You should verify that 
Pu,w 2 = Pu,w ; furthermore range Pu,w = U and null Pu,w = W- 

We have seen an example of an operator on R 2 with no eigenvalues. 
The following theorem shows that no such example exists on R 3 . 

5.26 Theorem: Every operator on an odd-dimensional real vector 
space has an eigenvalue. 

Proof: Suppose V is a real vector space with odd dimension. We 
will prove that every operator on V has an eigenvalue by induction (in 
steps of size 2) on the dimension of V. To get started, note that the 
desired result obviously holds if dim V = 1. 

Now suppose that dim V is an odd number greater than 1. Using 
induction, we can assume that the desired result holds for all operators 
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on all real vector spaces with dimension 2 less than dim V. Suppose 
T G £(V). We need to prove that T has an eigenvalue. If it does, we are 
done. If not, then by 5.24 there is a two-dimensional subspace U of V 
that is invariant under T. Let W be any subspace of V such that 

V = U® W; 

2.13 guarantees that such a W exists. 

Because W has dimension 2 less than dim V, we would like to apply 
our induction hypothesis to T\w- However, W might not be invariant 
under T, meaning that T\w might not be an operator on W. We will 
compose with the projection Pw,u to get an operator on W. Specifically, 
define S e L(W) by 

Sw = Pw,u(Tw) 

for w G W. By our induction hypothesis, 5 has an eigenvalue A. We 
will show that this A is also an eigenvalue for T. 

Let w G W be a nonzero eigenvector for S corresponding to the 
eigenvalue A; thus (5 - A I) w = 0. We would be done if w were an 
eigenvector for T with eigenvalue A; unfortunately that need not be 
true. So we will look for an eigenvector of T in U + span(w). To do 
that, consider a typical vector u + aw in U + span(w), where u G U 
and a G R. We have 

(T - A I)(u + aw) = Tu - Au + a(Tw - Aw) 

= Tu - Au + a(Pu,w(Tw) + Pw,u(Tw) - Aw) 

= Tu - Au + a(Pu,w(Tw ) + Sw - Aw) 

= Tu - Au + aPu,w(Tw). 

Note that on the right side of the last equation, Tu G U (because U 
is invariant under T), Au G U (because it G U), and aPu,w(Tw) G U 
(from the definition of Thus T - AI maps U + span(w) into U. 

Because U + span( w) has a larger dimension than U, this means that 
(T - A/)|u +S pan(w) is not injective (see 3.5). In other words, there exists 
a nonzero vector v G U + span(w) c V such that (T - AI)v = 0. Thus 
T has an eigenvalue, as desired. ■ 
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"Exercises 

1. Suppose T E £(V). Prove that if U\,...,U m are subspaces of V 
invariant under T, then Ui + ■ ■ ■ + U m is invariant under T. 

2. Suppose T E L(V). Prove that the intersection of any collection 
of subspaces of V invariant under T is invariant under T. 

3. Prove or give a counterexample: if U is a subspace of V that is 
invariant under every operator on V, then U = {0} or U = V. 

4. Suppose that S,T E L(V) are such that ST = TS. Prove that 
null(7’ - A I) is invariant under S for every A E F. 

5. Define T e £(F 2 ) by 

T(w,z ) = (z,w). 

Find all eigenvalues and eigenvectors of T. 

6. Define T e £(F 3 ) by 

T(zi,z 2 ,z 3 ) = (2z 2 ,0, 5z 3 ). 

Find all eigenvalues and eigenvectors of T. 

7. Suppose n is a positive integer and T E £(¥ n ) is defined by 

T(x ..x n ) = (Xi + ■ ■ ■ +x n ,+ ■ ■ ■ + x n ); 

in other words, T is the operator whose matrix (with respect to 
the standard basis) consists of all l’s. Find all eigenvalues and 
eigenvectors of T. 

8. Find all eigenvalues and eigenvectors of the backward shift op¬ 
erator T E X(F°°) defined by 

T(Zi,Z 2 ,Z 3 ,...) = (z 2 ,z 3l ...). 

9. Suppose T E £(V) and dim range T = k. Prove that T has at 
most k + 1 distinct eigenvalues. 

10. Suppose T E £(V) is invertible and A E F \ {0}. Prove that A is 
an eigenvalue of T if and only if ' is an eigenvalue of T 1 . 
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11. Suppose S,T e £ ( V ). Prove that ST and TS have the same eigen¬ 
values. 

12. Suppose T G £(V) is such that every vector in V is an eigenvector 
of T. Prove that T is a scalar multiple of the identity operator. 

13. Suppose T G £(V) is such that every subspace of V with di¬ 
mension dim V - 1 is invariant under T. Prove that T is a scalar 
multiple of the identity operator. 

14. Suppose S,T G £(V) and S is invertible. Prove that if p G T( F) 
is a polynomial, then 

piSTS” 1 ) = Sp(T)S~ 1 . 

15. Suppose F = C , T G £(V), p G T( C), and a G C. Prove that a is 
an eigenvalue of p ( T) if and only if a = p (A) f or some eigenvalue 
A of T. 

16. Show that the result in the previous exercise does not hold if C 
is replaced with R. 

17. Suppose V is a complex vector space and T G £(V). Prove 
that T has an invariant subspace of dimension j for each j = 
1,..., dim V. 

18. Give an example of an operator whose matrix with respect to 
some basis contains only 0’s on the diagonal, but the operator is 
invertible. 

19. Give an example of an operator whose matrix with respect to 
some basis contains only nonzero numbers on the diagonal, but 
the operator is not invertible. 

20. Suppose that T G £(V) has dim V distinct eigenvalues and that 
S G £(V) has the same eigenvectors as T (not necessarily with 
the same eigenvalues). Prove that ST = TS. 

21. Suppose P G £(V ) and P 2 = P. Prove that V = nullP © rangeP. 

22. Suppose V = U © W, where U and W are nonzero subspaces of V. 
Find all eigenvalues and eigenvectors of Pu,w- 


These two exercises 
show that 5.16 fails 
without the hypothesis 
that an upper- 
triangular matrix is 
under consideration. 
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23. Give an example of an operator T e £(R 4 ) such that T has no 
(real) eigenvalues. 

24. Suppose V is a real vector space and T e £(V) has no eigenval¬ 
ues. Prove that every subspace of V invariant under T has even 
dimension. 



Chapter 6 


Inner-'Product Spaces 


In making the definition of a vector space, we generalized the lin¬ 
ear structure (addition and scalar multiplication) of R 2 and R 3 . We 
ignored other important features, such as the notions of length and 
angle. These ideas are embedded in the concept we now investigate, 
inner products. 

Recall that F denotes R or C. 

Also, V is a finite-dimensional, nonzero vector space over F. 


❖ 

❖ ❖ 
* * * 
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If we think of vectors 
as points instead of 
arrows, then ||x|| 
should be interpreted 
as the distance from 
the point x to the 
origin. 


Inner TrocCucts 

To motivate the concept of inner product, let’s think of vectors in R 2 
and R 3 as arrows with initial point at the origin. The length of a vec¬ 
tor x in R 2 or R 3 is called the norm of x, denoted ||x||. Thus for 
x = (xi, X2 ) e R 2 , we have ||x|| = V*i 2 + X2 2 . 



The length of this vector x is yxi 2 + X2 2 . 

Similarly, for x = (xi,X2,X3) G R 3 , we have ||x|| = -Jx 1 2 + X2 2 + X3 2 . 
Even though we cannot draw pictures in higher dimensions, the gener¬ 
alization to R ' 1 is obvious: we define the norm of x = (xi, ...,x n ) eR" 
by 

11 x 11 = yjx 1 2 + ■ ■ ■ + x„ 2 . 

The norm is not linear on II' 1 . To inject linearity into the discussion, 
we introduce the dot product. For x,y e R n , the dot product of x 
and y, denoted x ■ y, is defined by 


x ■ y = xiyi + ■ ■ ■ + x n y n , 


where x = (xi,...,x M ) andy = (yi,... ,y n )- Note that the dot product 
of two vectors in R" is a number, not a vector. Obviously x ■ x = ||x || 2 
for all x G R n . In particular, x ■ x > 0 for all x G R n , with equality if 
and only if x = 0. Also, if y e R' 1 is fixed, then clearly the map from R n 
to R that sends x G R" to x ■ y is linear. Furthermore, x ■ y = y ■ x 
for all x,y G R”. 

An inner product is a generalization of the dot product. At this 
point you should be tempted to guess that an inner product is defined 
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by abstracting the properties of the dot product discussed in the para¬ 
graph above. For real vector spaces, that guess is correct. However, 
so that we can make a definition that will be useful for both real and 
complex vector spaces, we need to examine the complex case before 
making the definition. 

Recall that if A = a + bi, where a, b G R, then the absolute value 
of A is defined by 

| A | = Va 2 + b 2 , 

the complex conjugate of A is defined by 

A = a - bi, 

and the equation 

| A | 2 = AA 

connects these two concepts (see page 69 for the definitions and the 
basic properties of the absolute value and complex conjugate). For 
z = (zi,... , z n ) G C n , we define the norm of z by 

ll-zll = y/l^i I 2 + ■ ■ ■ + \ z n\ 2 - 

The absolute values are needed because we want ||z|| to be a nonnega¬ 
tive number. Note that 


11 z 11 2 = ZiZi + ■ ■ ■ + z M z n . 

We want to think of ||z|| 2 as the inner product of z with itself, as we 
did in R n . The equation above thus suggests that the inner product of 
w = (wi,..., w n ) G C n with z should equal 


WiZi + ■ ■ ■ + w n z n - 


If the roles of the w and z were interchanged, the expression above 
would be replaced with its complex conjugate. In other words, we 
should expect that the inner product of w with z equals the complex 
conjugate of the inner product of z with w. With that motivation, we 
are now ready to define an inner product on V, which may be a real or 
a complex vector space. 

An inner product on V is a function that takes each ordered pair 
(u,v) of elements of V to a number (u,v) G F and has the following 
properties: 
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If z is a complex 
number, then the 
statement z > 0 means 
that z is real and 
nonnegative. 


If we are dealing with 
R" rather than C n , then 
again the complex 
conjugate can be 
ignored. 


positivity 

(v, v) > 0 for all v g V\ 

defin i teness 

(v, v) = 0 if and only if v = 0; 

additivity in first slot 

{u + v, w ) = (u, w) + (v, w) for all u,v,w G V; 

homogeneity in first slot 

< av, w ) = a{v, w) for all a G F and all v,w G V\ 

conjugate symmetry 

(v, w) = (w, v) for all v, w G V. 

Recall that every real number equals its complex conjugate. Thus 
if we are dealing with a real vector space, then in the last condition 
above we can dispense with the complex conjugate and simply state 
that {v,w) = (w,v) for allv,w G V. 

An inner-product space is a vector space V along with an inner 
product on V. 

The most important example of an inner-product space is F n . We 
can define an inner product on F' 1 by 

6.1 {(w 1 ,...,w„),(z 1 ,...,z n )) = wizT+ ■ ■ ■ +w n Zn, 

as you should verify. This inner product, which provided our motiva¬ 
tion for the definition of an inner product, is called the Euclidean inner 
product on F M . When F f! is referred to as an inner-product space, you 
should assume that the inner product is the Euclidean inner product 
unless explicitly told otherwise. 

There are other inner products on F M in addition to the Euclidean 
inner product. For example, if Ci,..., c n are positive numbers, then we 
can define an inner product on F ' 1 by 


((Wi.W„), (Zi,...,Z„)> = C1W1Z1 + ■ ■ ■ + CnWnZn, 


as you should verify. Of course, if all the c’s equal 1, then we get the 
Euclidean inner product. 

As another example of an inner-product space, consider the vector 
space T* m (F) of all polynomials with coefficients in F and degree at 
most m. We can define an inner product on T m (¥) by 
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6.2 (p,q) = p(x)q(x) dx, 

Jo 

as you should verify. Once again, if F = R, then the complex conjugate 
is not needed. 

Let’s agree for the rest of this chapter that 
V is a finite-dimensional inner-product space over F. 

In the definition of an inner product, the conditions of additivity 
and homogeneity in the first slot can be combined into a requirement 
of linearity in the first slot. More precisely, for each fixed w e V, the 
function that takes v to (v, w) is a linear map from V to F. Because 
every linear map takes 0 to 0, we must have 

( 0 , w) = 0 

for every w e V. Thus we also have 

<w,0) = 0 

for every w e V (by the conjugate symmetry property). 

In an inner-product space, we have additivity in the second slot as 
well as the first slot. Proof: 

(it, V + w) = (v + w, it) 

= (v, it) + (w, u) 

= (v, u) + (w, u) 

= (it, v) + (u, w); 


here n, v, w G V. 

In an inner-product space, we have conjugate homogeneity in the 
second slot, meaning that ( u,av) = a(u,v) for all scalars a e F. 
Proof: 


(u, av) = (av, u) 

= a{v, u) 

= a{v, u) 

= a{u,v)\ 

here a e F and it, v e V. Note that in a real vector space, conjugate 
homogeneity is the same as homogeneity. 
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Some mathematicians 
use the term 
perpendicular, which 
means the same as 
orthogonal. 

The word orthogonal 

comes from the Greek 
word orthogonios, 
which means 
right-angled. 


Norms 

For v e V, we define the norm of v, denoted ||v||, by 

IMI = 

For example, if (zi,...,z n ) e F n (with the Euclidean inner product), 

then _ 

|| (Zi ,... , Zn) II = ^\zi\ 2 + ■ ■ ■ + \z n \ 2 . 

As another example, if p e T m (F) (with inner product given by 6.2), 
then 



Note that ||v|| = 0 if and only if v = 0 (because ( v , v) = 0 if and only 
if v = 0). Another easy property of the norm is that ||av|| = |a| ||v|| 
for all a e F and all v e V. Here’s the proof: 

||av|| 2 = (av, av) 

= a(v,av) 

= aa{v,v) 

= |a| 2 ||v|| 2 ; 

taking square roots now gives the desired equality. This proof illus¬ 
trates a general principle: working with norms squared is usually easier 
than working directly with norms. 

Two vectors u, v e V are said to be orthogonal if (u,v) = 0. Note 
that the order of the vectors does not matter because (u,v) = 0 if 
and only if (v, u) = 0. Instead of saying that it and v are orthogonal, 
sometimes we say that u is orthogonal to v. Clearly 0 is orthogonal 
to every vector. Furthermore, 0 is the only vector that is orthogonal to 
itself. 

For the special case where V = R 2 , the next theorem is over 2,500 
years old. 

6.3 Pythagorean Theorem: If u, v are orthogonal vectors in V, then 

6.4 ||w + v|| 2 = ||u|| 2 + |M| 2 . 
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Proof: Suppose that u, v are orthogonal vectors in V. Then 

\\u + v\\ 2 = (u + V, u + v) 

= Hull 2 + || v || 2 + (u,v) + {v,u) 

= llwll 2 + IMI 2 , 

as desired. ■ 

Suppose u, v G V. We would like to write u as a scalar multiple of v 
plus a vector w orthogonal to v, as suggested in the next picture. 


M 



An orthogonal decomposition 


The proof of the 
Pythagorean theorem 
shows that 6.4 holds if 
and only if 
(u,v) + (v,u), which 
equals 2 Re(u, v), is 0. 
Thus the converse of 
the Pythagorean 
theorem holds in real 
inner-product spaces. 


To discover how to write u as a scalar multiple of v plus a vector or¬ 
thogonal to v, let a G F denote a scalar. Then 


u = av + (u - av). 


Thus we need to choose a so that v is orthogonal to (u - av). In other 
words, we want 

0 = (u - av,v) = (u,v) - a||v|| 2 . 


The equation above shows that we should choose a to be (u, v )/ ||v|| 2 
(assume that v ± 0 to avoid division by 0). Making this choice of a, we 
can write 


6.5 


u = 


(u,v) 


-v 


( (u,v) \ 

V IIv |p v ) 


V 


As you should verify, if v ^ 0 then the equation above writes u as a 
scalar multiple of v plus a vector orthogonal to v. 

The equation above will be used in the proof of the next theorem, 
which gives one of the most important inequalities in mathematics. 
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In 1821 the French 
mathematician 
Augustin-Louis Cauchy 
showed that this 
inequality holds for the 
inner product defined 
by 6.1. In 1886 the 
German mathematician 
Herman Schwarz 
showed that this 
inequality holds for the 
inner product defined 
by 6.2. 


6.6 Cauchy-Schwarz Inequality: If u,v e V, then 

6.7 \{u,v )\< Hull ||v||. 

This inequality is an equality if and only if one of u,v is a scalar mul¬ 
tiple of the other. 


Proof: Let u, v e V. If v = 0, then both sides of 6.7 equal 0 and 
the desired inequality holds. Thus we can assume that v f 0. Consider 
the orthogonal decomposition 

(u,v) 

U = II 1,9 V + W < 


where w is orthogonal to v (here w equals the second term on the right 
side of 6.5). By the Pythagorean theorem, 

|2 


lull 2 = 


(u,v) 


-v 


m 


W 


\(u,y)\‘ 

IMI 2 


W 


6.8 


> \(u,y )\ 2 
~ IMI 2 


Multiplying both sides of this inequality by || v || 2 and then taking square 
roots gives the Cauchy-Schwarz inequality 6.7. 

Looking at the proof of the Cauchy-Schwarz inequality, note that 6.7 
is an equality if and only if 6.8 is an equality. Obviously this happens if 
and only if w = 0. But w = 0 if and only if u is a multiple of v (see 6.5). 
Thus the Cauchy-Schwarz inequality is an equality if and only if u is a 
scalar multiple of v or v is a scalar multiple of u (or both; the phrasing 
has been chosen to cover cases in which either u or v equals 0). ■ 


The next result is called the triangle inequality because of its geo¬ 
metric interpretation that the length of any side of a triangle is less 
than the sum of the lengths of the other two sides. 



The triangle inequality 
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The triangle inequality 
can be used to show 
that the shortest path 
between two points is a 
straight line segment. 


Proof: Let u, v e V. Then 

\\u + v\\ 2 = (u + V, u + v) 

= (u,u) + (v,v) + (u,v) + {v,u) 

= (u,u) + (v,v) + (u,v) + < u,v ) 

= Hit || 2 + || v || 2 + 2Re(n, v) 

< ||n|| 2 + ||v|| 2 + 2|(it, v)| 

< ||it|| 2 +||v|| 2 + 2||it|| ||v|| 

= (llu|l + IM|) 2 , 

where 6.12 follows from the Cauchy-Schwarz inequality (6.6). Taking 
square roots of both sides of the inequality above gives the triangle 
inequality 6.10. 

The proof above shows that the triangle inequality 6.10 is an equality 
if and only if we have equality in 6.11 and 6.12. Thus we have equality 
in the triangle inequality 6.10 if and only if 

6.13 (it, v) = IliiHIMI. 

If one of it, v is a nonnegative multiple of the other, then 6.13 holds, as 
you should verify. Conversely, suppose 6.13 holds. Then the condition 
for equality in the Cauchy-Schwarz inequality (6.6) implies that one of 
it, v must be a scalar multiple of the other. Clearly 6.13 forces the 
scalar in question to be nonnegative, as desired. ■ 

The next result is called the parallelogram equality because of its 
geometric interpretation: in any parallelogram, the sum of the squares 
of the lengths of the diagonals equals the sum of the squares of the 
lengths of the four sides. 


6.11 
6.12 


6.9 Triangle Inequality: If u,v <s V, then 

6.10 II u + v|| < Hull + IMU 

This inequality is an equality if and only if one of u, v is a nonnegative 
multiple of the other. 
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u 



6.14 Parallelogram Equality: Ifu,veV,then 

\\u + v\\ 2 + \\u - v\\ 2 = 2(\\u\\ 2 + ||v|| 2 ). 

Proof: Let u, v e V. Then 

\\u + v\\ 2 + ||lt - V II 2 = (u + v,u + v) + {u - v,u - v) 

= II Xi || 2 + ||v|| 2 + (u,v) + (v,u) 

+ ||n|| 2 + ||V|j 2 - (u,v) - (v,u) 

= 2(||n|| 2 + IM| 2 ), 

as desired. ■ 

OrtfumormaC 'Bases 

A list of vectors is called orthonormal if the vectors in it are pair¬ 
wise orthogonal and each vector has norm 1. In other words, a list 
(ei,...,e m ) of vectors in V is orthonormal if ( ej,e^) equals 0 when 
j ± k and equals 1 when j = k (for j,k = 1,..., m). For example, the 
standard basis in F' 1 is orthonormal. Orthonormal lists are particularly 
easy to work with, as illustrated by the next proposition. 

6.1 5 Proposition: If (e\,e m ) is an orthonormal list of vectors 
in V, then 

Waiei + ■ ■ ■ + a TO e m || 2 = |ai| 2 + ■ ■ ■ + |a m | 2 
for all ai,..., a m e F. 

Proof: Because each e ; has norm 1, this follows easily from re¬ 
peated applications of the Pythagorean theorem (6.3). ■ 

Now we have the following easy but important corollary. 
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6.16 Corollary: Every orthonormal list of vectors is linearly inde¬ 
pendent. 

Proof: Suppose (ei ,..., e m ) is an orthonormal list of vectors in V 
and tti, G F are such that 


tliei + ■ ■ ■ + CLm^m — 0. 

Then la/ 2 + ■ ■ ■ + |a m | 2 = 0 (by 6.15), which means that all the a/s 
are 0, as desired. ■ 


An orthonormal basis of V is an orthonormal list of vectors in V 
that is also a basis of V. For example, the standard basis is an ortho¬ 
normal basis of F n . Every orthonormal list of vectors in V with length 
dim V is automatically an orthonormal basis of V (proof: by the pre¬ 
vious corollary, any such list must be linearly independent; because it 
has the right length, it must be a basis—see 2.17). To illustrate this 
principle, consider the following list of four vectors in R 4 : 


((5 


1 1 
2 ’ 2 ’ 2 ’ 2 


i ),! 1 1 


2 ’ 2 ’ 


1 

' 2 ’ 


4) (i - 

? / j V 9 i 


1 

' 2 ’ 


.1 I) (. 
2 > 2 ' ’ ' 


1 1 

2 ’ 2 ’ 


1 1 )1 

2 ' 2 ’ >■ 


The verification that this list is orthonormal is easy (do it!); because we 
have an orthonormal list of length four in a four-dimensional vector 
space, it must be an orthonormal basis. 

In general, given a basis (e\,... ,e n ) of V and a vector v G V, we 
know that there is some choice of scalars a i,..., a m such that 


v — aiei + ■ ■ ■ + a n e n , 

but finding the a/s can be difficult. The next theorem shows, however, 
that this is easy for an orthonormal basis. 

6.1 7 Theorem: Suppose (e\,e n ) is an orthonormal basis of V. 
Then 

6.18 v = (v,e 1 )e 1 + ■ ■ ■ + (v,e n )e n 
and 

6.19 IM| 2 = |(v,ei )| 2 + ■ ■ ■ + |(v, e n )\ 2 


The importance of 
orthonormal bases 
stems mainly from this 
theorem. 


for every v G V. 
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Proof: Let v e V. Because (e\,...,e n ) is a basis of V, there exist 
scalars a\,...,a n such that 


v — ci\e.\ + ■ ■ ■ + ci n e n . 


Take the inner product of both sides of this equation with ej, get¬ 
ting ( v,ej) = GLj. Thus 6.18 holds. Clearly 6.19 follows from 6.18 
and 6.15. ■ 


The Danish 
mathematician Jorgen 
Gram (1850-1916) and 
the German 
mathematician Erhard 
Schmidt (1876-1959) 
popularized this 
algorithm for 
constructing 
orthonormal lists. 


Now that we understand the usefulness of orthonormal bases, how 
do we go about finding them? For example, does T m ( F), with inner 
product given by integration on [0,1] (see 6.2), have an orthonormal 
basis? As we will see, the next result will lead to answers to these ques¬ 
tions. The algorithm used in the next proof is called the Gram-Schmidt 
procedure. It gives a method for turning a linearly independent list into 
an orthonormal list with the same span as the original list. 

6.20 Gram-Schmidt: If (Vi,...,v TO ) is a linearly independent list 
of vectors in V, then there exists an orthonormal list (e\,..., e m ) of 
vectors in V such that 

6.21 span(vi,. .., Vj) = span(ei, ..., ej) 
for j = 1,..., hi- 


Proof: Suppose (vi,..., v m ) is a linearly independent list of vec¬ 
tors in V. To construct the e’s, start by setting e\ = vi/||vi||. This 
satisfies 6.21 for j = 1. We will choose €■>,■■■, e m inductively, as fol¬ 
lows. Suppose j > 1 and an orthornormal list (e\, ..., ej-i) has been 
chosen so that 

6.22 span(vi,...,Vj_i) = span(ei,..., e,-i). 

Let 

623 c vj - (Vj.ei)ei-<Vj,ej-i)ej-i 

3 \\vj - (Vj,ei)ei - (Vj, ej-i)ej-il\' 

Note that Vj £ span(vi,..., v/_ i) (because (vi,..., v m ) is linearly inde¬ 
pendent) and thus Vj £ span(ei,..., ej-i). Hence we are not dividing 
by 0 in the equation above, and so ej is well defined. Dividing a vector 
by its norm produces a new vector with norm i; thus ||ej|| = i. 
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Let 1 < k < j. Then 


(ej,e k ) 


vj - (v j ,e 1 )e 1 -- 

IIvj - (vj,e i)ei -- {Vj,ej-i)ej-i\\ 



_ (Vj,e fc ) ~ (Vj,e k ) _ 

IIVj - (vj,e i)ei - -- <Vj,|| 


= 0. 


Thus (ei,..., ej) is an orthonormal list. 

From 6.23, we see that Vj e spanlei,..., ej). Combining this infor¬ 
mation with 6.22 shows that 


span(vi,...,Vj) c span(ei 

Both lists above are linearly independent (the v’s by hypothesis, the e’s 
by orthonormality and 6.16). Thus both subspaces above have dimen¬ 
sion j, and hence they must be equal, completing the proof. ■ 

Now we can settle the question of the existence of orthonormal 
bases. 


6.24 Corollary: Every finite-dimensional inner-product space has an 
orthonormal basis. 

Proof: Choose a basis of V. Apply the Gram-Schmidt procedure 
(6.20) to it, producing an orthonormal list. This orthonormal list is 
linearly independent (by 6.16) and its span equals V. Thus it is an 
orthonormal basis of V. m 


Until this corollary, 
nothing we had done 
with inner-product 
spaces required our 
standing assumption 
that V is finite 
dimensional. 


As we will soon see, sometimes we need to know not only that an 
orthonormal basis exists, but also that any orthonormal list can be 
extended to an orthonormal basis. In the next corollary, the Gram- 
Schmidt procedure shows that such an extension is always possible. 

6.25 Corollary: Every orthonormal list of vectors in V can be ex¬ 
tended to an orthonormal basis of V. 


Proof: Suppose (e\, ..., e m ) is an orthonormal list of vectors in V. 
Then (e \,..., e m ) is linearly independent (by 6.16), and hence it can be 
extended to a basis (ei ,..., e m , Vi,..., v M ) of V (see 2.12). Now apply 
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the Gram-Schmidt procedure (6.20) to (ei,... ,e m ,Vi,... ,v n ), produc¬ 
ing an orthonormal list 

6.26 (ei,... , e m, ft, ■ ■ ■, fn ), 

here the Gram-Schmidt procedure leaves the first m vectors unchanged 
because they are already orthonormal. Clearly 6.26 is an orthonormal 
basis of V because it is linearly independent (by 6.16) and its span 
equals V. Hence we have our extension of (ei, ...,e m ) to an orthonor¬ 
mal basis of V. m 

Recall that a matrix is called upper triangular if all entries below the 
diagonal equal 0. In other words, an upper-triangular matrix looks like 
this: 

* * 

0 * 

In the last chapter we showed that if V is a complex vector space, then 
for each operator on V there is a basis with respect to which the matrix 
of the operator is upper triangular (see 5.13). Now that we are dealing 
with inner-product spaces, we would like to know when there exists an 
orthonormal basis with respect to which we have an upper-triangular 
matrix. The next corollary shows that the existence of any basis with 
respect to which T has an upper-triangular matrix implies the existence 
of an orthonormal basis with this property. This result is true on both 
real and complex vector spaces (though on a real vector space, the hy¬ 
pothesis holds only for some operators). 

6.27 Corollary: Suppose T e £(V). If T has an upper-triangular 
matrix with respect to some basis of V, then T has an upper-triangular 
matrix with respect to some orthonormal basis of V. 

Proof: Suppose T has an upper-triangular matrix with respect to 
some basis (vi,..., v n ) of V. Thus span(vi,..., vj) is invariant under 
T for each j = 1,..., n (see 5.12). 

Apply the Gram-Schmidt procedure to (vi,..., v M ), producing an 
orthonormal basis (ei,...,e n ) of V. Because 


span(ei,...,ej) = span(vi,..., Vj) 
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for each j (see 6.21), we conclude that span(ei,..., e } ) is invariant un¬ 
der T for each j = 1 ,n. Thus, by 5.12, T has an upper-triangular 
matrix with respect to the orthonormal basis (e \,..., e n ). ■ 

The next result is an important application of the corollary above. 


6.28 Corollary: Suppose V is a complex vector space and T G £(V). 
Then T has an upper-triangular matrix with respect to some orthonor¬ 
mal basis of V. 

Proof: This follows immediately from 5.13 and 6.27. ■ 

Ortfwgonof Trojections and 
Minimization TroBCems 


This result is 
sometimes called 
Schur’s theorem. The 
German mathematician 
Issai Schur published 
the first proof of this 
result in 1909. 


If U is a subset of V, then the orthogonal complement of U, de¬ 
noted U L , is the set of all vectors in V that are orthogonal to every 
vector in U: 


U L = {v G V : (v, u) = 0 for all u e U}. 

You should verify that U L is always a subspace of V, that V L = {0}, 
and that {0}^ = V. Also note that if U\ c U 2 , then fif d IJ-f. 

Recall that if U\, U 2 are subspaces of V, then V is the direct sum of 
Ui and U 2 (written V = Ui © U 2 ) if each element of V can be written in 
exactly one way as a vector in Ui plus a vector in [/ 2 . The next theorem 
shows that every subspace of an inner-product space leads to a natural 
direct sum decomposition of the whole space. 

6.29 Theorem: If U is a subspace of V, then 

V = U ®U L . 

Proof: Suppose that U is a subspace of V. First we will show that 

6.30 V = U + U ± . 

To do this, suppose v G V. Let (e\,..., e m ) be an orthonormal basis 
of U. Obviously 
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6.B1 

v = (v,ei)ei + ■ ■ ■ + (v, e m )e m + y - {v,ei)ei -- (v,e m )e m . 

U W 

Clearly u G U. Because (e \,..., e m ) is an orthonormal list, for each j 
we have 


(w,ej) = ( v,ej) - ( v,ej) 

= 0. 

Thus w is orthogonal to every vector in span(ei,..., e m ). In other 
words, w G JJ L . Thus we have written v = u + w, where u G U 
and w G U- 1 , completing the proof of 6.30. 

If v G U n f/ 1 , then v (which is in U) is orthogonal to every vector 
in U (including v itself), which implies that (v, v) = 0, which implies 
that v = 0. Thus 

6.32 U nU- 1 = {0}. 

Now 6.30 and 6.32 imply that V = U © U ± (see 1.9). ■ 

The next corollary is an important consequence of the last theorem. 

6.33 Corollary: If U is a subspace of V, then 

U = (U ± ) ± . 

Proof: Suppose that U is a subspace of V. First we will show that 

6.34 Uc(U L ) x . 

To do this, suppose that u G U. Then (u,v) = 0 for every v gU l (by 
the definition of U ± ). Because u is orthogonal to every vector in U L , 
we have u G (U- 1 )- 1 , completing the proof of 6.34. 

To prove the inclusion in the other direction, suppose v G (IJ-) . 
By 6.29, we can write v = u + w, where u G U and w G U L . We have 
v - u = w G U L . Because v G (U 1 ) 1 and u G (U - 1 )- 1 (from 6.34), we 
have v - u G (U ± ) ± . Thus v -u G U L n ([/^)^, which implies that v - u 
is orthogonal to itself, which implies that v - u = 0, which implies that 
v = u, which implies that veil, Thus (1/-)- c U, which along with 
6.34 completes the proof. ■ 
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Suppose U is a subspace of V. The decomposition V = U ® f/- 1 given 
by 6.29 means that each vector v G V can be written uniquely in the 
form 

v = u + w, 

where u G U and w G U ± . We use this decomposition to define an op¬ 
erator on V, denoted Pu, called the orthogonal projection of V onto U. 
For v G V, we define Puv to be the vector u in the decomposition above. 
In the notation introduced in the last chapter, we have Pu = Pu,u ± ■ You 
should verify that Pu e L(V) and that it has the following proper¬ 
ties: 


• range Pu = U\ 

• nullTy = U L \ 

• v - Puv G JJ L for every v G V\ 

• Pu 2 = Pu ; 

• \\Puv\\ < |M| for every v G V. 

Furthermore, from the decomposition 6.31 used in the proof of 6.29 
we see that if (ei,...,e m ) is an orthonormal basis of U, then 

6.35 P v v = (v,e i)ei +■ ■ ■ + (v,e m )e m 

for every v e V. 

The following problem often arises: given a subspace U of V and 
a point v G V, find a point u G U such that ||v - u\\ is as small as 
possible. The next proposition shows that this minimization problem 
is solved by taking u = Pijv. 


6.36 Proposition: Suppose U is a suhspace of V and v G V. Then 

\\v -Puv\\ < \\v -u\\ 

for every u G U. Furthermore, if u G U and the inequality above is an 
equality, then u = Puv. 

Proof: Suppose u g U. Then 

6.37 ||v - Puv || 2 < ||v - Puv || 2 + || P v v - u\\ 2 

6.38 = ||(v-Pyv) + (Puv-u)\\ 2 

= llv - nil 2 , 


The remarkable 
simplicity of the 
solution to this 
minimization problem 
has led to many 
applications of 
inner-product spaces 
outside of pure 
mathematics. 
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where 6.38 comes from the Pythagorean theorem (6.3), which applies 
because v - Puv G JJ L and P\jv -u G U. Taking square roots gives the 
desired inequality. 

Our inequality is an equality if and only if 6.37 is an equality, which 
happens if and only if || P\jv - u || = 0, which happens if and only if 
u = Puv. m 


v 



Puv is the closest point in U to v. 

The last proposition is often combined with the formula 6.35 to 
compute explicit solutions to mini mi zation problems. As an illustra¬ 
tion of this procedure, consider the problem of finding a polynomial u 
with real coefficients and degree at most 5 that on the interval [— tt, tt] 
approximates sinx as well as possible, in the sense that 

I | sinx - u(x )| 2 dx 

J -TT 

is as small as possible. To solve this problem, let C[-tt, tt] denote the 
real vector space of continuous real-valued functions on [-tt, tt] with 
inner product 

6.39 < f,g ) = [ f(x)g(x) dx. 

J-TT 

Let v G C[-tt,tt1 be the function defined by v(x) = sinx. Let U 
denote the subspace of C[-tt,tt1 consisting of the polynomials with 
real coefficients and degree at most 5. Our problem can now be re¬ 
formulated as follows: find it G U such that |v - u\\ is as small as 
possible. 

To compute the solution to our approximation problem, first apply 
the Gram-Schmidt procedure (using the inner product given by 6.39) 



Orthogonal Projections and Minimization Problems 


11 5 


A machine that can 
perform integrations is 
useful here. 

6.40 0.987862x - 0.155271x 3 + 0.00564312x 5 , 

where the tt’s that appear in the exact answer have been replaced with 
a good decimal approximation. 

By 6.36, the polynomial above should be about as good an approxi¬ 
mation to sinx on [— tt, tt] as is possible using polynomials of degree 
at most 5. To see how good this approximation is, the picture below 
shows the graphs of both sinx and our approximation 6.40 over the 
interval [-tt, tt]. 


to the basis (1, x, x 2 , x 3 , x 4 , x 5 ) of U, producing an orthonormal basis 
( 61 , 62 , 63 , 64 , 65 , 66 ) of U. Then, again using the inner product given 
by 6.39, compute Pjjv using 6.35 (with m = 6). Doing this computation 
shows that Ppv is the function 



Our approximation 6.40 is so accurate that the two graphs are almost 
identical—our eyes may see only one graph! 

Another well-known approximation to sinx by a polynomial of de¬ 
gree 5 is given by the Taylor polynomial 


To see how good this approximation is, the next picture shows the 
graphs of both sinx and the Taylor polynomial 6.41 over the interval 
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Graphs of sin a and the Taylor polynomial 6.41 

The Taylor polynomial is an excellent approximation to sin a for x 
near 0. But the picture above shows that for \x\ > 2, the Taylor poly¬ 
nomial is not so accurate, especially compared to 6.40. For example, 
taking x = 3, our approximation 6.40 estimates sin 3 with an error of 
about 0.001, but the Taylor series 6.41 estimates sin 3 with an error of 
about 0.4. Thus at x = 3, the error in the Taylor series is hundreds of 
times larger than the error given by 6.40. Linear algebra has helped us 
discover an approximation to sin x that improves upon what we learned 
in calculus! 

We derived our approximation 6.40 by using 6.35 and 6.36. Our 
standing assumption that V is finite dimensional fails when V equals 
C[-tt, tt] , so we need to justify our use of those results in this case. 
First, reread the proof of 6.29, which states that if U is a subspace of V, 
then 

6.42 V = U®U ± . 


If we allow V to be 
infinite dimensional 
and allow U to be an 
infini te-dimensional 
subspace of V, then 
6.42 is not necessarily 
true without additional 
hypotheses. 


Note that the proof uses the finite dimensionality of U (to get a basis 
of U) but that it works fine regardless of whether or not V is fi ni te 
dimensional. Second, note that the definition and properties of Pp (in¬ 
cluding 6.35) require only 6.29 and thus require only that U (but not 
necessarily V) be finite dimensional. Finally, note that the proof of 6.36 
does not require the finite dimensionality of V. Conclusion: for v e V 
and U a subspace of V, the procedure discussed above for finding the 
vector u e U that makes ||v-w|| as small as possible works if U is finite 
dimensional, regardless of whether or not V is finite dimensional. In 
the example above U was indeed finite dimensional (we had dim U = 6), 
so everything works as expected. 
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Linear JunctionaLs andJAdjoints 

A linear functional on V is a linear map from V to the scalars F. 
For example, the function qp : F 3 — F defined by 


6.43 


qp(zi,z 2 ,z 3 ) = 2zi - 5z 2 + z 3 


is a linear functional on F 3 . As another example, consider the inner- 
product space ?6(R) (here the inner product is multiplication followed 
by integration on [0,1]; see 6.2). The function qp: 2VR) — R defined 
by 


6.44 


(pip) 


ft 

p(x)(cosx) dx 
o 


is a linear functional on JVR). 

If v G V, then the map that sends u to ( u , v) is a linear functional 
on V. The next result shows that every linear functional on V is of this 
form. To illustrate this theorem, note that for the linear functional qp 
defined by 6.43, we can take v = (2, — 5,1) G F 3 . The linear functional 
qp defined by 6.44 better illustrates the power of the theorem below be¬ 
cause for this linear functional, there is no obvious candidate for v (the 
function cosx is not eligible because it is not an element of J^IR)). 


6.45 Theorem: Suppose qp is a linear functional on V. Then there is 
a unique vector v e V such that 

c pin) = (u,v) 


for every u e V. 

Proof: First we show that there exists a vector v e V such that 
qp(u) = (u,v) for every u e V. Let (ei,...,e n ) be an orthonormal 
basis of V. Then 

qp(u) = qp((u, e\)e\ + ■ ■ ■ + {u,e n )e n ) 

= {u,ei)qp(ei) + ■ ■ ■ + (u,e n )qp(e n ) 

= (u , cp(ei)ei + ■ ■ ■ + qp(e n )e n ) 

for every u g V, where the first equality comes from 6.17. Thus setting 
v = op(e\)e\ + ■ ■ ■ + qp(e n )e n , we have qpiu) = (u,v) for every u e V, 
as desired. 
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The word adjoint has 
another meaning in 
linear algebra. We will 
not need the second 
meaning, related to 
inverses, in this book. 
Just in case you 
encountered the 
second meaning for 
adjoint elsewhere, be 
warned that the two 
meanings for adjoint 
are unrelated to one 
another. 


Now we prove that only one vector v G V has the desired behavior. 
Suppose vi,V 2 e V are such that 

cp(u) = (u,v i) = (u,v 2 ) 


for every u G V. Then 

0 = (u,v i) - (u,v 2 ) = (u,v 1 - v 2 ) 

for every u G V. Taking u = vi - v 2 shows that Vi - v 2 = 0. In other 
words, Vi = v 2 , completing the proof of the uniqueness part of the 
theorem. ■ 

In addition to V, we need another finite-dimensional inner-product 
space. 

Let’s agree that for the rest of this chapter 
IT is a finite-dimensional, nonzero, inner-product space over F. 

Let T G £(V,W). The adjoint of T, denoted T*, is the function from 
W to V defined as follows. Fix w G W. Consider the linear functional 
on V that maps v G V to (Tv, w). Let T*w be the unique vector in V 
such that this linear functional is given by taking inner products with 
T* w (6.45 guarantees the existence and uniqueness of a vector in V 
with this property). In other words, T* w is the unique vector in V 
such that 

(Tv, w) = (v , T*w) 

for all v G V. 

Let’s work out an example of how the adjoint is computed. Define 
T : R 3 - R 2 by 

T(x 1 ,x 2 ,x 3 ) = (x 2 + 3x3, 2xi). 

Thus T* will be a function from R 2 to R 3 . To compute T*, fix a point 
(yi,yi) e R 2 . Then 

((xi,x 2 ,x 3 ), T*(y 1 ,y 2 )) = (T(x 1 ,x 2 ,x 3 ),(y 1 ,y 2 )) 

= <(x 2 + 3x 3 ,2x0, (yi,yi)) 

= x 2 y\ + 3x 3 :ki + 2 xi y 2 
= <(xi,x 2 ,x 3 ), (2y 2 ,y!, 3yi)) 


for all (xi, x 2 , x 3 ) G R 3 . This shows that 
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T*(yi,y 2 ) = (2y 2 ,yi,'iyi). 

Note that in the example above, T* turned out to be not just a func¬ 
tion from R 2 to R 3 , but a linear map. That is true in general. Specif¬ 
ically, if T G £(V,W), then T* e £{W,V). To prove this, suppose 
T G £(V,W). Let’s begin by checking additivity. Fix W| , w 2 G W. 
Then 


(Tv,w i + w 2 ) = (Tv,w i) + (Tv,w 2 ) 

= {v,T*Wi ) + ( v,T*w 2 ) 

= (v,T*w i + T*w 2 ), 

which shows that T*w i + T*w 2 plays the role required of T* (wi + w 2 ). 
Because only one vector can behave that way, we must have 

T*w i + T*w 2 = T*(w i +w 2 ). 

Now let’s check the homogeneity of T*. If a G F, then 

(Tv, aw) = a(Tv, w) 

= a(v,T*w) 

= (v, aT*w), 

which shows that aT*w plays the role required of T*(aw). Because 
only one vector can behave that way, we must have 

aT*w = T* (aw). 

Thus T * is a linear map, as claimed. 

You should verify that the function T — T* has the following prop¬ 
erties: 

additivity 

(S + T)* =S* + T* for all S,T G £{V, W)\ 

conjugate homogeneity 

(aT)* = dT* for all a G F and T e £(V, W)\ 

adjoint of adjoint 

(T*)* = T for all T e £(V,W)\ 

identity 

I* = I, where / is the identity operator on V\ 


Adjoints play a crucial 
role in the important 
results in the next 
chapter. 
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products 

(ST)* = T*S* for all T G £(V, W) and 5 e £(W, U) (here U is an 
inner-product space over F). 

The next result shows the relationship between the null space and 
the range of a linear map and its adjoint. The symbol <=> means “if and 
only if”; this symbol could also be read to mean “is equivalent to”. 

6.46 Proposition: Suppose T g£(V,W). Then 

(a) nullT* = (range T) 1 ; 

(b) range T* = (null T) 1 ; 

(c) nullT = (range T*) 1 ; 

(d) rangeT = (nullT*) 1 . 

Proof: Let’s begin by proving (a). Let w G W. Then 

w G nullT* ^ T*w = 0 

<=> ( v , T* w) = 0 for all v G V 
<=> (Tv, w) = 0 for all v G V 
<=> w G (range T) 1 . 

Thus null T* = (range T) 1 , proving (a). 

If we take the orthogonal complement of both sides of (a), we get (d), 
where we have used 6.33. Finally, replacing T with T* in (a) and (d) gives 
(c) and (b). ■ 


If F = R, then the 
conjugate transpose of 
a matrix is the same as 
its transpose, which is 
the matrix obtained by 
interchanging the rows 
and columns. 


The conjugate transpose of an m-by-n matrix is the n-by-m matrix 
obtained by interchanging the rows and columns and then taking the 
complex conjugate of each entry. For example, the conjugate transpose 
of 

2 3 + 4i 7 

6 5 8 i 

is the matrix 

2 6 

3 — 4t 5 

7 -8 i 


The next proposition shows how to compute the matrix of T* from 
the matrix of T. Caution: the proposition below applies only when 
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we are dealing with orthonormal bases—with respect to nonorthonor¬ 
mal bases, the matrix of T* does not necessarily equal the conjugate 
transpose of the matrix of T. 

6.47 Proposition: Suppose T e £(V,W). If (ei,...,e n ) is an or¬ 
thonormal basis of V and (/j ,... ,f m ) is an orthonormal basis of W, 
then 

is the conjugate transpose of 

Proof: Suppose that (ei,..., e n ) is an orthonormal basis of V and 
is an orthonormal basis ofW. We write M(T) instead of the 
longer expression M(T, (e k ,..., e n ), (/j ,... ,/ m ));we also write M(T*) 
instead of M(T *, (e u ..., e n )). 

Recall that we obtain the fc th column of M ( T) by writing Tet as a lin¬ 
ear combination of the ff s; the scalars used in this linear combination 
then become the k th column of M(T). Because (/i,..., f m ) is an ortho¬ 
normal basis of W, we know how to write Tck as a linear combination 
of the fj’s (see 6.17): 

Te k = {Te k ,fi)fi + ■■■ + (Te k ,f m )f m . 

Thus the entry in row j, column k, of M(T) is (Te k ,fj). Replacing T 
with T* and interchanging the roles played by the e’s and /’s, we see 
that the entry in row j, column k, of 1M(T *) is ( T*f k , ej), which equals 
{f k , Tej), which equals {Tej,f k ), which equals the complex conjugate 
of the entry in row k, column j, of M(T). In other words, 1M(T *) equals 
the conjugate transpose of M(T). m 


The adjoint of a linear 
map does not depend 
on a choice of basis. 
This explains why we 
will emphasize adjoints 
of linear maps instead 
of conjugate 
transposes of matrices. 
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"Exercises 

1. Prove that if x, y are nonzero vectors in R 2 , then 

(x,y) = ||x||||;y|| cos 0 , 

where 9 is the angle between x and y (thinking of x and y as 
arrows with initial point at the origin). Hint: draw the triangle 
formed by x, y, and x - y; then use the law of cosines. 

2. Suppose u,v G V. Prove that (u, v) = 0 if and only if 

||n|| < ||u + av|| 

for all a G F. 

3. Prove that 

n t n n i 2 

(I«A) 

j=l J =1 J =1 J 

for all real numbers ai,...,a n and b\,...,b n . 

4. Suppose u,v G V are such that 

||-u|| = 3, Hi* + v|| = 4, ||w - v|| =6. 

What number must ||v|| equal? 

5. Prove or disprove: there is an inner product on R 2 such that the 
associated norm is given by 

|| (xi, x 2 ) II = |xi| + I x 2 1 

for all (xi, X 2 ) E R 2 . 

6 . Prove that if V is a real inner-product space, then 

\\u + v\\ 2 - \\u - v\\ 2 
(u,v) = --- 

for all u,v G V . 

7. Prove that if V is a complex inner-product space, then 

||u + v|| 2 - ||u - v|| 2 + ||u + iv|| 2 i - ||it - iv|| 2 i 


for all u, v G V. 
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8 . A norm on a vector space U is a function || ||: U — [0, oo) such 
that ||u|| = 0 if and only if u = 0, ||<xu|| = |a|||ii|| for all a e F 
and all u G U, and ||u + v|| < ||it|| + ||v|| for all u, v G U. Prove 
that a norm satisfying the parallelogram equality comes from 
an inner product (in other words, show that if || || is a norm 
on U satisfying the parallelogram equality, then there is an inner 
product ( , ) on U such that ||u|| = (u, u) 1/2 for all u G U). 


9. 


Suppose n is a positive integer. Prove that 

/ 1 sinx sin2x sin nx cosx cos2x cos nx\ 

'V2tt’ yff ytf s/tt ’ v'tt -Jtt ’ ytf ' 

is an orthonormal list of vectors in C[-tt, tt 1, the vector space of 
continuous real-valued functions on [-tt, tt! with inner product 


This orthonormal list is 
often used for 
modeling periodic 
phenomena such as 
tides. 


( f,9) 


f(x)g(x) 

J-TT 


dx. 


10. On ?2 (R), consider the inner product given by 


(p,q) 


f 1 

p(x)q(x) dx. 
o 


Apply the Gram-Schmidt procedure to the basis (l,x,x 2 ) to pro¬ 
duce an orthonormal basis of ? 2 <R)- 


11. What happens if the Gram-Schmidt procedure is applied to a list 
of vectors that is not linearly independent? 

12. Suppose V is a real inner-product space and (vi,...,v m ) is a 
linearly independent list of vectors in V. Prove that there exist 
exactly 2 m orthonormal lists (ei,...,e m ) of vectors in V such 
that 

span(vi,..., Vj) = spanlei,..., ej) 
for all j G {1,..., m}. 

13. Suppose (e\,... , e m ) is an orthonormal list of vectors in V. Let 
v G V. Prove that 


IMI 2 = l(v,ei )| 2 + ■ ■ ■ + |<v,e m )| 2 


if and only if v G span(ei,... ,e m ). 
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14. Find an orthonormal basis of Tz( R) (with inner product as in 
Exercise 10) such that the differentiation operator (the operator 
that takes p to p') on V? (R) has an upper-triangular matrix with 
respect to this basis. 

15. Suppose U is a subspace of V. Prove that 

dim!/- 1 = dim V - dim!/. 


16. Suppose U is a subspace of V. Prove that U L = {0} if and only if 
U = V. 

17. Prove that if P e £(V) is such that P 2 = P and every vector 
in null P is orthogonal to every vector in range P, then P is an 
orthogonal projection. 

18. Prove that if P G £(V) is such that P 2 = P and 


19. 

20 . 

21 . 

22 . 


23. 


IIPvll < IMl 


for every v G V, then P is an orthogonal projection. 

Suppose T G £(V) and U is a subspace of V. Prove that U is 
invariant under T if and only if PijTPjj = TPu- 

Suppose T G £(V) and U is a subspace of V. Prove that U and 
JJ L are both invariant under T if and only if PuT = TPu- 

In R 4 , let 

U = span((l, 1,0,0), (1,1,1,2)). 


Find u G U such that \\u - (1,2, 3,4) || is as small as possible. 
Find p G ?3 (R) such that p( 0) = 0, p' ( 0) = 0, and 

I \2 + 3x - p(x)\ 2 dx 

Jo 

is as small as possible. 

Find p G P$(R) that makes 



sinx - p(x)\ 2 dx 


as small as possible. (The polynomial 6.40 is an excellent approx¬ 
imation to the answer to this exercise, but here you are asked to 
find the exact solution, which involves powers of tt. A computer 
that can perform symbolic integration will be useful.) 
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24. Find a polynomial (jePidO such that 

1 f 1 

P(tt) = p(x)q(x) dx 

2 Jo 

for every p e Jh (R). 

25. Find a polynomial q G Tz (R) such that 

I p(x) (cos ttx) dx = p(x)q(x)dx 
Jo Jo 

for every p e Jh (R). 

26. Fix a vector v e V and define T e £(V, F) by Tu = (u,v). For 
aeF, hnd a formula for T*a. 

27. Suppose n is a positive integer. Define T G £( F n ) by 

T(zi,...,z n ) = (0,zi,...,z n _i). 

Find a formula for T* (zi,... ,z n ). 

28. Suppose T G £(V) and AeF. Prove that A is an eigenvalue of T 
if and only if A is an eigenvalue of T*. 

29. Suppose T G £(V) and U is a subspace of V. Prove that U is 
invariant under T if and only if U L is invariant under T*. 

30. Suppose T G £(V, W). Prove that 

(a) T is injective if and only if T* is surjective; 

(b) T is surjective if and only if T* is injective. 

31. Prove that 

dim null T* = dim null T + dim W - dim V 

and 

dim range T* = dim range T 
for every T e £(V, W). 

32. Suppose A is an m-by-n matrix of real numbers. Prove that the 
dimension of the span of the columns of A (in R m ) equals the 
dimension of the span of the rows of A (in R M ). 
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The deepest results related to inner-product spaces deal with the 
subject to which we now turn—operators on inner-product spaces. By 
exploiting properties of the adjoint, we will develop a detailed descrip¬ 
tion of several important classes of operators on inner-product spaces. 

Recall that F denotes R or C. 

Let’s agree that for this chapter 
V is a finite-dimensional, nonzero, inner-product space over F. 


% ❖ % 
❖ ❖ ❖ ❖ 


127 




128 


Chapter 7. Operators on Inner-Product Spaces 


SeCf-JAdjomt andNormaC Operators 


Instead of self-adjoint, 
some mathematicians 
use the term Hermitian 
(in honor of the French 
mathematician Charles 
Hermite, who in 1873 
published the first 
proof that e is not the 
root of any polynomial 
with integer 
coefficients). 


An operator T e L(V) is called self-adjoint if T = T*. For example, 
if T is the operator on F 2 whose matrix (with respect to the standard 
basis) is 

' 2 b " 

3 7 J ’ 

then T is self-adjoint if and only if b = 3 (because 1M(T) = M(T*) if and 
only if b = 3; recall that M(T*) is the conjugate transpose of M(T ) — 
see 6.47). 

You should verify that the sum of two self-adjoint operators is self- 
adjoint and that the product of a real scalar and a self-adjoint operator 
is self-adjoint. 

A good analogy to keep in mind (especially when F = C) is that 
the adjoint onl(V) plays a role similar to complex conjugation on C. 
A complex number z is real if and only if z = z; thus a self-adjoint 
operator (T = T*) is analogous to a real number. We will see that 
this analogy is reflected in some important properties of self-adjoint 
operators, beginning with eigenvalues. 


If F = R, then by 
definition every 
eigenvalue is real, so 
this proposition is 
interesting only when 
F = C. 


7.1 Proposition: Every eigenvalue of a self-adjoint operator is real. 

Proof: Suppose T is a self-adjoint operator on V. Let A be an 
eigenvalue of T, and let v be a nonzero vector in V such that Tv = Av. 
Then 


A|M| 2 = (Av,v) 
= (Tv,v) 
= (v, Tv) 
= (v, Av) 
= A|| v || 2 . 


Thus A = A, which means that A is real, as desired. 


The next proposition is false for real inner-product spaces. As an 
example, consider the operator T e £(R 2 ) that is a counterclockwise 
rotation of 90° around the origin; thus T(x,y) = (~y,x). Obviously 
Tv is orthogonal to v for every v e R 2 , even though T is not 0. 
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7.2 Proposition: If V is a complex inner-product space and T is an 
operator on V such that 

(Tv,v) = 0 

for all v e V, then T = 0. 


Proof: Suppose V is a complex inner-product space and T g £(V). 
Then 


(Tu,w) = 


(T(u + w),u + w) - (T(u - w),u - w) 


(T(u + iw),u + iw) - (T(u - iw),u - iw) . 
-;- 1 


for all u, w e V, as can be verified by computing the right side. Note 
that each term on the right side is of the form (Tv,v) for appropriate 
v G V. If (Tv, v) = 0 for all v e V, then the equation above implies that 
( Tu, w) = 0 for all u, w e V. This implies that T = 0 (take w = Tu). m 


The following corollary is false for real inner-product spaces, as 
shown by considering any operator on a real inner-product space that 
is not self-adjoint. 


7.3 Corollary: Let V be a complex inner-product space and let 
T G L(V). Then T is self-adjoint if and only if 

(Tv,v) G R 


for every vet. 


This corollary provides 
another example of 
how self-adjoint 
operators behave like 
real numbers. 


Proof: Let v g V. Then 


(Tv,v) - ( Tv,v) = (Tv,v) - (v,Tv) 

= (Tv,v ) - < T*v,v ) 

= ((T -T*)v,v). 

If (Tv, v)eR for every v G V, then the left side of the equation above 
equals 0, so ((T - T*)v,v) = 0 for every v G V. This implies that 
T - T* = 0 (by 7.2), and hence T is self-adjoint. 

Conversely, if T is self-adjoint, then the right side of the equation 
above equals 0, so (Tv, v) = (Tv,v) for every v e V. This implies that 
(Tv,v) G R for every v G V, as desired. ■ 
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On a real inner-product space V, a nonzero operator T may satisfy 
(Tv, v) = 0 for all v e V. However, the next proposition shows that 
this cannot happen for a self-adjoint operator. 

7.4 Proposition: IfT is a self-adjoint operator on V such that 

(Tv,v) =0 

for all v e V, then T = 0. 

Proof: We have already proved this (without the hypothesis that 
T is self-adjoint) when V is a complex inner-product space (see 7.2). 
Thus we can assume that V is a real inner-product space and that T is 
a self-adjoint operator on V. For u, w e V, we have 

, c , (T(u + w),u + w) - (T(u - w),u - w) 

7.5 (Tu,w) =---; 

this is proved by computing the right side, using 

(Tw, u) = (w, Tu ) 

= (Tu,w ), 

where the first equality holds because T is self-adjoint and the second 
equality holds because we are working on a real inner-product space. 
If (Tv,v) = 0 for all v e V, then 7.5 implies that ( Tu, w) = 0 for all 
u, w e V. This implies that T = 0 (take w = Tu). m 

An operator on an inner-product space is called normal if it com¬ 
mutes with its adjoint; in other words, T e £(V) is normal if 

TT* _j 7 * j- 

Obviously every self-adjoint operator is normal. For an example of a 
normal operator that is not self-adjoint, consider the operator on F 2 
whose matrix (with respect to the standard basis) is 

" 2 -3 " 

3 2 

Clearly this operator is not self-adjoint, but an easy calculation (which 
you should do) shows that it is normal. 

We will soon see why normal operators are worthy of special at¬ 
tention. The next proposition provides a simple characterization of 
normal operators. 
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7.6 Proposition: An operator T e £(V) is normal if and only if 

II Tv || = ||r*v|| 


for all v e V. 

Proof: Let T e L(V). We will prove both directions of this result 
at the same time. Note that 

T is normal s=> T*T - TT* = 0 

<=> ((T*T- TT*)v,v) = 0 for all v e V 
<^> {T*Tv,v) = {TT*v,v) for all v e V 
<=> ||Tv|| 2 = ||T*v|| 2 for all v e V, 

where we used 7.4 to establish the second equivalence (note that the 
operator T*T - TT* is self-adjoint). The equivalence of the first and 
last conditions above gives the desired result. ■ 

Compare the next corollary to Exercise 28 in the previous chapter. 
That exercise implies that the eigenvalues of the adjoint of any operator 
are equal (as a set) to the complex conjugates of the eigenvalues of the 
operator. The exercise says nothing about eigenvectors because an 
operator and its adjoint may have different eigenvectors. However, the 
next corollary implies that a normal operator and its adjoint have the 
same eigenvectors. 

7.7 Corollary: Suppose T e L(V) is normal. If v e V is an eigen¬ 
vector of T with eigenvalue A e F, then v is also an eigenvector of T* 
with eigenvalue A. 

Proof: Suppose v e V is an eigenvector of T with eigenvalue A. 
Thus (T - A I)v = 0. Because T is normal, so is T - A/, as you should 
verify. Using 7.6, we have 

0= ||(T-AJ)v|| = \\(T-M)*v\\ = ||(T* - AJ)v||, 
and hence v is an eigenvector of T* with eigenvalue A, as desired. ■ 

Because every self-adjoint operator is normal, the next result applies 
in particular to self-adjoint operators. 


Note that this 
proposition implies 
that null T = null T* 
for every normal 
operator T. 
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7.8 Corollary: If T e £(V) is normal, then eigenvectors of T 
corresponding to distinct eigenvalues are orthogonal. 

Proof: Suppose T e £(V) is normal and a,fi are distinct eigen¬ 
values of T, with corresponding eigenvectors u, v. Thus Tu = au and 
Tv = fiv. From 7.7 we have T*v = fiv. Thus 

(« - (S){u,v) = (au,v) - (u,$v) 

= (Tu,v) - (u,T*v) 

= 0. 

Because a f /), the equation above implies that (u, v) = 0. Thus u and 
v are orthogonal, as desired. ■ 

Tfie SpectraC Theorem 

Recall that a diagonal matrix is a square matrix that is 0 everywhere 
except possibly along the diagonal. Recall also that an operator on V 
has a diagonal matrix with respect to some basis if and only if there is 
a basis of V consisting of eigenvectors of the operator (see 5.21). 

The nicest operators on V are those for which there is an ortho¬ 
normal basis of V with respect to which the operator has a diagonal 
matrix. These are precisely the operators T e £(V) such that there is 
an orthonormal basis of V consisting of eigenvectors of T. Our goal 
in this section is to prove the spectral theorem, which characterizes 
these operators as the normal operators when F = C and as the self- 
adjoint operators when F = R. The spectral theorem is probably the 
most useful tool in the study of operators on inner-product spaces. 

Because the conclusion of the spectral theorem depends on F, we 
will break the spectral theorem into two pieces, called the complex 
spectral theorem and the real spectral theorem. As is often the case in 
linear algebra, complex vector spaces are easier to deal with than real 
vector spaces, so we present the complex spectral theorem first. 

As an illustration of the complex spectral theorem, consider the 
normal operator T ef(C 2 ) whose matrix (with respect to the standard 
basis) is 

~ 2 -3 ~ 

3 2 


You should verify that 
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(U,l) 

V V2 ’ V2 / 

is an orthonormal basis of C 2 consisting of eigenvectors of T and that 
with respect to this basis, the matrix of T is the diagonal matrix 

"2 + 3 i 0 

0 2-3 i ' 


7.9 Complex Spectral Theorem: Suppose that V is a complex 
inner-product space and T e £(V). Then V has an orthonormal basis 
consisting of eigenvectors of T if and only if T is normal. 

Proof: First suppose that V has an orthonormal basis consisting of 
eigenvectors of T. With respect to this basis, T has a diagonal matrix. 
The matrix of T* (with respect to the same basis) is obtained by taking 
the conjugate transpose of the matrix of T ; hence T* also has a diag¬ 
onal matrix. Any two diagonal matrices commute; thus T commutes 
with T*, which means that T must be normal, as desired. 

To prove the other direction, now suppose that T is normal. There 
is an orthonormal basis (e\,..., e n ) of V with respect to which T has 
an upper-triangular matrix (by 6.28). Thus we can write 


Because every 
self-adjoint operator is 
normal, the complex 
spectral theorem 
implies that every 
self-adjoint operator on 
a Unite-dimensional 
complex inner-product 
space has a diagonal 
matrix with respect to 
some orthonormal 
basis. 


7.10 


M(T, (e\,..., e n )) 


ft 1,1 ■■■ Ctl ,n 

0 U n ,n 


We will show that this matrix is actually a diagonal matrix, which means 
that (ei ,..., e n ) is an orthonormal basis of V consisting of eigenvectors 
of T. 

We see from the matrix above that 


lireill 2 = |a u | 2 


and 

\\T*e 1 \\ 2 = |a u | 2 + \a li2 \ 2 + ■ ■ ■ + |ai,„| 2 . 

Because T is normal, ||Tei|| = ||T*ei|| (see 7.6). Thus the two equations 
above imply that all entries in the first row of the matrix in 7.10, except 
possibly the first entry a u, equal 0. 

Now from 7.10 we see that 


\\Te 2 \\ 2 = \a 2 , 2 \ 2 
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(because a i j2 = 0, as we showed in the paragraph above) and 

\\T*e 2 \\ 2 = |a 2 , 2 | 2 + I tt 2 , 3 1 2 + ■ ■ ■ + |a 2 ,„| 2 . 

Because T is normal, ||Te 2 || = ||r*e 2 ||. Thus the two equations above 
imply that all entries in the second row of the matrix in 7.10, except 
possibly the diagonal entry a 2j2 , equal 0. 

Continuing in this fashion, we see that all the nondiagonal entries 
in the matrix 7.10 equal 0, as desired. ■ 

We will need two lemmas for our proof of the real spectral theo¬ 
rem. You could guess that the next lemma is true and even discover its 
proof by thinking about quadratic polynomials with real coefficients. 
Specifically, suppose a, /5 e R and a 2 < 4/5. Let x be a real number. 
Then 

x 2 + ax + /5 = (x + f) 2 + (£-^) 

> 0. 

In particular, x 2 + ax + /5 is an invertible real number (a convoluted 
way of saying that it is not 0). Replacing the real number x with a 
self-adjoint operator (recall the analogy between real numbers and self- 
adjoint operators), we are led to the le mm a below. 

7.11 Lemma: Suppose T e L(V) is self-adjoint. If a, ft e R are such 
that a 2 < 4/5, then 

T 2 + aT + 131 

is invertible. 

Proof: Suppose a, /5 e R are such that a 2 < 4/5. Let v be a nonzero 
vector in V. Then 

(( T 2 + aT + /5 1)v,v) = (T 2 v,v) + a(Tv,v) + /5(v, v) 

= (Tv, Tv) + a(Tv,v) + /5||v|| 2 

> ||Tv|| 2 - I«11|Tv||||v|| +/5||v|| 2 

- (Iirvll - ^ydl ) 2 + </S-?f)||v || 2 

> 0 , 


This technique of 
completing the square 
can be used to derive 
the quadratic formula. 
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where the first inequality holds by the Cauchy-Schwarz inequality (6.6). 
The last inequality implies that ( T 2 + cxT + fil)v f 0. Thus T 2 + aT + fil 
is injective, which implies that it is invertible (see 3.21). ■ 

We have proved that every operator, self-adjoint or not, on a finite¬ 
dimensional complex vector space has an eigenvalue (see 5.10), so the 
next lemma tells us something new only for real inner-product spaces. 

7.12 Lemma: Suppose T e £(V) is self-adjoint. Then T has an 
eigenvalue. 


Proof: As noted above, we can assume that V is a real inner- 
product space. Let n = dim V and choose v e V with v f 0. Then 

(v, Tv, T 2 v, ..., T n v) 

cannot be linearly independent because V has dimension n and we have 
n + 1 vectors. Thus there exist real numbers a-o,. .., a n , not all 0, such 
that 

0 = clqv + aiTv + ■ ■ ■ + a n T n v. 


Here we are imitating 
the proof that T has an 
invariant subspace of 
dimension 1 or 2 
(see 5.24). 


Make the a’s the coefficients of a polynomial, which can be written in 
factored form (see 4.14) as 


ao + a\x + ■ ■ ■ + a n x n 

= c(x 2 + (Xix + pi)... (x 2 + a M x + Pm)(x - Ai)... (x - A m ), 

where c is a nonzero real number, each otj, fij, and A j is real, each 
< Xj 2 < 4/3,, m + M > 1, and the equation holds for all real x. We then 
have 

0 = aov + a\Tv + ■ ■ ■ + a n T n v 
= (aol + a\T + ■ ■ ■ + a n T n )v 

= c(T 2 + £XiT + ^i/)...(T 2 + <x M T + p M IHT-\ 1 I)...(T-\ m I)v. 

Each T 2 + ajT + fijl is invertible because T is self-adjoint and each 
(Xj 2 < 4(see 7.11). Recall also that c f 0. Thus the equation above 
implies that 

0 = Cr-A 1 J)...(T-A w J)v. 

Hence T - A jl is not injective for at least one j. In other words, T has 
an eigenvalue. ■ 
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As an illustration of the real spectral theorem, consider the self- 
adjoint operator T on R 3 whose matrix (with respect to the standard 
basis) is 

14 -13 8 

-13 14 8 

8 8-7 

You should verify that 

/( 1 ,- 1 , 0 ) ( 1 , 1 , 1 ) ( 1 , 1 ,- 2 )\ 
l V2 ’ 4 ’ V6 ) 

is an orthonormal basis of R 3 consisting of eigenvectors of T and that 
with respect to this basis, the matrix of T is the diagonal matrix 

" 27 0 0 

0 9 0. 

0 0-15 

Combining the complex spectral theorem and the real spectral the¬ 
orem, we conclude that every self-adjoint operator on V has a diagonal 
matrix with respect to some orthonormal basis. This statement, which 
is the most useful part of the spectral theorem, holds regardless of 
whether F = C or F = R. 

7.1 3 Real Spectral Theorem: Suppose that V is a real inner-product 
space and T e £(V). Then V has an orthonormal basis consisting of 
eigenvectors of T if and only if T is self-adjoint. 

Proof: First suppose that V has an orthonormal basis consisting of 
eigenvectors of T. With respect to this basis, T has a diagonal matrix. 
This matrix equals its conjugate transpose. Hence T = T* and so T is 
self-adjoint, as desired. 

To prove the other direction, now suppose that T is self-adjoint. We 
will prove that V has an orthonormal basis consisting of eigenvectors 
of T by induction on the dimension of V. To get started, note that our 
desired result clearly holds if dim V = 1. Now assume that dim V > 1 
and that the desired result holds on vector spaces of smaller dimen¬ 
sion. 

The idea of the proof is to take any eigenvector u of T with norm 1, 
then adjoin to it an orthonormal basis of eigenvectors of T|; W }+, Now 
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for the details, the most important of which is verifying that T\{ U }± is 
self-adjoint (this allows us to apply our induction hypothesis). 

Let A be any eigenvalue of T (because T is self-adjoint, we know 
from the previous lemma that it has an eigenvalue) and let u G V 
denote a corresponding eigenvector with |[u|[ = i. Let U denote the 
one-dimensional subspace of V consisting of all scalar multiples of u. 
Note that a vector v £ V is in U ± if and only if ( u , v) = 0. 

Suppose v e U L . Then because T is self-adjoint, we have 

( u,Tv) = (Tu,v) = (A u, v) = A (u, v) = 0, 

and hence Tv G U L . Thus Tv G LJ- whenever v e LJ-. In other words, 
JJ L is invariant under T. Thus we can define an operator S G £(U-) by 
S = T|yj.. If v, w G U L , then 

(Sv,w) = ( Tv,w) = (v,Tw) = (v,Sw), 

which shows that S is self-adjoint (note that in the middle equality 
above we used the self-adjointness of T). Thus, by our induction hy¬ 
pothesis, there is an orthonormal basis of JJ L consisting of eigenvec¬ 
tors of S. Clearly every eigenvector of S is an eigenvector of T (because 
Sv = Tv for every v G U- 1 ). Thus adjoining u to an orthonormal basis 
of U L consisting of eigenvectors of S gives an orthonormal basis of V 
consisting of eigenvectors of T, as desired. ■ 

For T G L(V) self-adjoint (or, more generally, T G £(V) normal 
when F = C), the corollary below provides the nicest possible decom¬ 
position of V into subspaces invariant under T. On each null(T - A ; /), 
the operator T is just multiplication by A j. 

7.14 Corollary: Suppose that T G £(V) is self-adjoint (or that F = C 
and that T G £(V) is normal). Let Ai,..., A m denote the distinct eigen¬ 
values of T. Then 

V = null(T - AiJ) ® ■ ■ ■ ® null(T - A m I). 

Furthermore, each vector in each null(T - A,/) is orthogonal to all vec¬ 
tors in the other subspaces of this decomposition. 

Proof: The spectral theorem (7.9 and 7.13) implies that V has a 
basis consisting of eigenvectors of T. The desired decomposition of V 
now follows from 5.21. 

The orthogonality statement follows from 7.8. ■ 


To get an eigenvector 
of norm 1, take any 
nonzero eigenvector 
and divide it by its 
norm. 
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tNormaC Operators on r ReaC 
Inner-TrocCuct Spaces 

The complex spectral theorem (7.9) gives a complete description 
of normal operators on complex inner-product spaces. In this section 
we will give a complete description of normal operators on real inner- 
product spaces. Along the way, we will encounter a proposition (7.18) 
and a technique (block diagonal matrices) that are useful for both real 
and complex inner-product spaces. 

We begin with a description of the operators on a two-dimensional 
real inner-product space that are normal but not self-adjoint. 

7.15 Lemma: Suppose V is a two-dimensional real inner-product 
space and T e £(V). Then the following are equivalent: 

(a) T is normal but not self-adjoint; 

(b) the matrix of T with respect to every orthonormal basis of V 
has the form 

a -b 
b a 

with b f 0 ; 

(c) the matrix of T with respect to some orthonormal basis of V has 
the form 

a -b 
b a ’ 

with b > 0 . 


Proof: First suppose that (a) holds, so that T is normal but not 
self-adjoint. Let (ei, e 2 ) be an orthonormal basis of V. Suppose 

7.16 M(T,(e 1 ,e 2 )) = \ * C , 1. 


Then ||Tei|| 2 = a 2 + b 2 and ||T*ei|| 2 = a 2 + c 2 . Because T is normal, 
||Tei|| = ||r*ei|| (see 7.6); thus these equations imply that b 2 = c 2 . 
Thus c = b or c = -b. But c f b because otherwise T would be self- 
adjoint, as can be seen from the matrix in 7.16. Hence c = -b, so 


a -b 
b d 


7.17 


M{T,(e i,e 2 )) 
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Of course, the matrix of T* is the transpose of the matrix above. Use 
matrix multiplication to compute the matrices of TT* and T*T (do it 
now). Because T is normal, these two matrices must be equal. Equating 
the entries in the upper-right corner of the two matrices you computed, 
you will discover that bd = ah. Now b T 0 because otherwise T would 
be self-adjoint, as can be seen from the matrix in 7.17. Thus d = a, 
completing the proof that (a) implies (b). 

Now suppose that (b) holds. We want to prove that (c) holds. Choose 
any orthonormal basis (ei, e 2 ) of V. We know that the matrix of T with 
respect to this basis has the form given by (b), with b 0. If b > 0, 
then (c) holds and we have proved that (b) implies (c). If b < 0, then, 
as you should verify, the matrix of T with respect to the orthonormal 
basis (ei, - 62 ) equals [ 4 , „ ], where -b > 0; thus in this case we also 
see that (b) implies (c). 

Now suppose that (c) holds, so that the matrix of T with respect to 
some orthonormal basis has the form given in (c) with b > 0. Clearly 
the matrix of T is not equal to its transpose (because b ± 0), and hence 
T is not self-adjoint. Now use matrix multiplication to verify that the 
matrices of TT * and T*T are equal. We conclude that TT * = T*T, and 
hence T is normal. Thus (c) implies (a), completing the proof. ■ 


As an example of the notation we will use to write a matrix as a 
matrix of smaller matrices, consider the matrix 


D = 


112 2 2 
112 2 2 
0 0 3 3 3 

0 0 3 3 3 

0 0 3 3 3 


We can write this matrix in the form 


where 


A 
0 C 


A = 


1 1 
1 1 


2 2 2 
2 2 2 


C = 


3 3 3 " 
3 3 3 
3 3 3 


Often we can 
understand a matrix 
better by thinking of it 
as composed of smaller 
matrices. We will use 
this technique in the 
next proposition and in 
later chapters. 


and 0 denotes the 3-by-2 matrix consisting of all 0’s. 
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The next result will play a key role in our characterization of the 
normal operators on a real inner-product space. 

Without normality, an 7.18 Proposition: Suppose T G £(V) is normal and U is a subspace 


easier result also holds: 

of V that is invariant under T. Then 

if T G £(V) and U 

invariant under T, then 

(a) 

IJ is invariant under T; 

IJ- is invariant under 

(b) 

U is invariant under T*; 

T*; see Exercise 29 in 

(c) 

(T\u)* = ( T*)\u; 

Chapter 6. 


(d) 

T\u is a normal operator on U; 


(e) 

T\u- is a normal operator on U L 


Proof: First we will prove (a). Let (e \,..., e m ) be an orthonormal 
basis of U. Extend to an orthonormal basis of V 

(this is possible by 6.25). Because U is invariant under T, each Tej is 
a linear combination of (ei,...,e m ). Thus the matrix of T with respect 
to the basis (ei ,..., e m ,/i,... ,f n ) is of the form 


ei 

M(T) = 

J l 
fn 


6l ■ ■ ■ 6m fl ■ ■ ■ fn 

A B 


0 


C 


here A denotes an m-by-m matrix, 0 denotes the n-by-m matrix con¬ 
sisting of all 0’s, B denotes an m-by-n matrix, C denotes an n-by-n 
matrix, and for convenience the basis has been listed along the top and 
left sides of the matrix. 

For each j G {1,..., m}, \\Tej\\ 2 equals the sum of the squares of the 
absolute values of the entries in the j lh column of A (see 6.17). Hence 


7.19 


m 


I WTejW 

j =i 


2 


the sum of the squares of the absolute 
values of the entries of A. 


For each j e {1. m], ||r*ej|| 2 equals the sum of the squares of the 

absolute values of the entries in the j th rows of A and B. Hence 
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7.20 


m 


XU T *ej\\ 2 = 

i -i 


the sum of the squares of the absolute 
values of the entries of A and B. 


Because T is normal, \\Tej\\ = \\T*ej\\ for each j (see 7.6); thus 


m 


1 Ill'll 2 

j =i 


III T* 


j= i 


This equation, along with 7.19 and 7.20, implies that the sum of the 
squares of the absolute values of the entries of B must equal 0. In 
other words, B must be the matrix consisting of all 0’s. Thus 


7.21 


ei 


M(T) = e ’ n 
J l 

fn 


6\ ... e m f\ ... f n 

A 0 

0 C 


This representation shows that Tfk is in the span of (/i,...,/ n ) for 
each k. Because (/i,... ,f n ) is a basis of U- 1 , this implies that Tv G JJ L 
whenever v £ U L . In other words, U L is invariant under T, completing 
the proof of (a). 

To prove (b), note that M(T*) has a block of 0’s in the lower left 
corner (because M(T), as given above, has a block of 0’s in the upper 
right corner). In other words, each T*ej can be written as a linear 
combination of (ei,...,e m ). Thus U is invariant under T*, completing 
the proof of (b). 

To prove (c), let 5 = T\u- Fix v e U. Then 


( Su,v) = ( Tu,v) 

= (u,T*v) 


for all u G U. Because T*v e U (by (b)), the equation above shows that 
S*v = T*v. In other words, (T\u)* = (T*)\u, completing the proof 
of (c). 

To prove (d), note that T commutes with T* (because T is normal) 
and that (Tip)* = (T*)\u (by (c)). Thus Tip commutes with its adjoint 
and hence is normal, completing the proof of (d). 
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To prove (e), note that in (d) we showed that the restriction of T to 
any invariant subspace is normal. However, U L is invariant under T 
(by (a)), and hence T\u± is normal. ■ 


The key step in the 
proof of the last 
proposition was 
showing that M(T) is 
an appropriate block 
diagonal matrix; 
see 7.21. 


In proving 7.18 we thought of a matrix as composed of smaller ma¬ 
trices. Now we need to make additional use of that idea. A block diag¬ 
onal matrix is a square matrix of the form 

' Ai 0 " 

> 

0 A m 

where A\,... ,A m are square matrices lying along the diagonal and all 
the other entries of the matrix equal 0. For example, the matrix 


7.22 


A = 


4 0 0 0 0 

0 2 -3 0 0 

0 3 2 0 0 

00 0 1-7 

0 0 0 7 1 


is a block diagonal matrix with 


A = 


Ai 0 

A'2 

0 a 3 


where 

7.23 Ai = [ 4 ], A 2 



1 -7 
7 1 


If A and B are block diagonal matrices of the form 



' Ai 

0 


' B i 

0 

A = 

0 

A m 

, B = 

0 

B m 


where Aj has the same size as Bj for j = 1 ,,m, then AB is a block 
diagonal matrix of the form 


7.24 


AiBi 0 


AJ3 = 


AmB 


m J 


0 
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as you should verify. In other words, to multiply together two block 
diagonal matrices (with the same size blocks), just multiply together the 
corresponding entries on the diagonal, as with diagonal matrices. 

A diagonal matrix is a special case of a block diagonal matrix where 
each block has size 1-by-l. At the other extreme, every square matrix is 
a block diagonal matrix because we can take the first (and only) block 
to be the entire matrix. Thus to say that an operator has a block di¬ 
agonal matrix with respect to some basis tells us nothing unless we 
know something about the size of the blocks. The smaller the blocks, 
the nicer the operator (in the vague sense that the matrix then contains 
more 0’s). The nicest situation is to have an orthonormal basis that 
gives a diagonal matrix. We have shown that this happens on a com¬ 
plex inner-product space precisely for the normal operators (see 7.9) 
and on a real inner-product space precisely for the self-adjoint opera¬ 
tors (see 7.13). 

Our next result states that each normal operator on a real inner- 
product space comes close to having a diagonal matrix—specifically, 
we get a block diagonal matrix with respect to some orthonormal basis, 
with each block having size at most 2-by-2. We cannot expect to do bet¬ 
ter than that because on a real inner-product space there exist normal 
operators that do not have a diagonal matrix with respect to any basis. 
For example, the operator T G £(R 2 ) defined by T(x,y ) = (~y,x) is 
normal (as you should verify) but has no eigenvalues; thus this partic¬ 
ular T does not have even an upper-triangular matrix with respect to 
any basis of R 2 . 

Note that the matrix in 7.22 is the type of matrix promised by the 
theorem below. In particular, each block of 7.22 (see 7.23) has size 
at most 2-by-2 and each of the 2-by-2 blocks has the required form 
(upper left entry equals lower right entry, lower left entry is positive, 
and upper right entry equals the negative of lower left entry). 


Note that if an operator 
T has a block diagonal 
matrix with respect to 
some basis, then the 
entry 7 in any 1-by-l 
block on the diagonal 
of this matrix must be 
an eigenvalue of T. 


7.25 Theorem: Suppose that V is a real inner-product space and 
T G £(V). Then T is normal if and only if there is an orthonormal 
basis of V with respect to which T has a block diagonal matrix where 
each block is a I -by -1 matrix or a 2 -by-2 matrix of the form 


with b > 0. 
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In a real vector space 
with dimension 1, there 
are precisely two 
vectors with norm 1. 


Many ma thema ticians 
also use the term 
positive semidefinite 
operator, which means 
the same as positive 
operator. 
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Proof: To prove the easy direction, first suppose that there is an 
orthonormal basis of V such that the matrix of T is a block diagonal 
matrix where each block is a 1-by-l matrix or a 2-by-2 matrix of the 
form 7.26. With respect to this basis, the matrix of T commutes with 
the matrix of T* (which is the conjugate of the matrix of 7 ), as you 
should verify (use formula 7.24 for the product of two block diagonal 
matrices). Thus T commutes with T *, which means that T is normal. 

To prove the other direction, now suppose that T is normal. We will 
prove our desired result by induction on the dimension of V. To get 
started, note that our desired result clearly holds if dim V = 1 (trivially) 
or if dim V = 2 (if T is self-adjoint, use the real spectral theorem 7.13; 
if T is not self-adjoint, use 7.15). 

Now assume that dim V > 2 and that the desired result holds on 
vector spaces of smaller dimension. Let U be a subspace of V of di¬ 
mension 1 that is invariant under T if such a subspace exists (in other 
words, if T has a nonzero eigenvector, let U be the span of this eigen¬ 
vector). If no such subspace exists, let U be a subspace of V of dimen¬ 
sion 2 that is invariant under T (an invariant subspace of dimension 1 
or 2 always exists by 5.24). 

If dim!/ = 1, choose a vector in U with norm 1; this vector will 
be an orthonormal basis of U, and of course the matrix of T\u is a 
1-by-l matrix. If dim U = 2, then T\p is normal (by 7.18) but not self- 
adjoint (otherwise T\jj, and hence T, would have a nonzero eigenvector; 
see 7.12), and thus we can choose an orthonormal basis of U with re¬ 
spect to which the matrix of T\u has the form 7.26 (see 7.15). 

Now U L is invariant under T and T\u- is a normal operator on U L 
(see 7.18). Thus by our induction hypothesis, there is an orthonormal 
basis of U L with respect to which the matrix of T\u± has the desired 
form. Adjoining this basis to the basis of U gives an orthonormal basis 
of V with respect to which the matrix of T has the desired form. ■ 

Tositive Operators 

An operator T e £(V) is called positive if T is self-adjoint and 

(Tv,v) > 0 

for all v e V. Note that if V is a complex vector space, then the 
condition that T be self-adjoint can be dropped from this definition 
(by 7.3). 
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You should verify that every orthogonal projection is positive. For 
another set of examples, look at the proof of 7.11, where we showed 
that if T £ £(V) is self-adjoint and a, /> £ R are such that a 2 < 4/1, 
then T 2 + aT + fil is positive. 

An operator 5 is called a square root of an operator T if 5 2 = T. 
For example, if T £ £( F 3 ) is defined by T(zi,Z2,Z3) = (Z3,0, 0), then 
the operators £ £(F 3 ) defined by S(zi, Z2, Z3) = (Z2,Z3,0) is a square 
root of T. 

The following theorem is the main result about positive operators. 
Note that its characterizations of the positive operators correspond to 
characterizations of the nonnegative numbers among C. Specifically, 
a complex number z is nonnegative if and only if it has a nonnegative 
square root, corresponding to condition (c) below. Also, z is nonnega¬ 
tive if and only if it has a real square root, corresponding to condition 
(d) below. Finally, z is nonnegative if and only if there exists a complex 
number w such that z = ww, corresponding to condition (e) below. 

7.27 Theorem: Let T £ £(V). Then the following are equivalent: 

(a) T is positive; 

(b) T is self-adjoint and all the eigenvalues ofT are nonnegative; 

(c) T has a positive square root; 

(d) T has a self-adjoint square root; 

(e) there exists an operator S £ £(V) such that T = 5*5. 

Proof: We will prove that (a) => (b) => (c) => (d) => (e) => (a). 

First suppose that (a) holds, so that T is positive. Obviously T is 
self-adjoint (by the definition of a positive operator). To prove the other 
condition in (b), suppose that A is an eigenvalue of T. Let v be a nonzero 
eigenvector of T corresponding to A. Then 

0 <(Tv,v) 

= (Av,v) 

= A(v, v), 

and thus A is a nonnegative number. Flence (b) holds. 

Now suppose that (b) holds, so that T is self-adjoint and all the eigen¬ 
values of T are nonnegative. By the spectral theorem (7.9 and 7.13), 


The positive operators 
correspond, in some 
sense, to the numbers 
[0, 00 ), so better 
terminology would call 
these nonnegative 
instead of positive. 
However, operator 
theorists consistently 
call these the positive 
operators, so we will 
follow that custom. 
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there is an orthonormal basis (e\,...,e n ) of V consisting of eigen¬ 
vectors of T. Let Ai,..., A n be the eigenvalues of T corresponding to 
ei,...,e n , so that each A j is a nonnegative number. Define 5 G £(V) 
by 



for j = 1 ,,n. Then 5 is a positive operator, as you should verify. 
Furthermore, 5 2 ej = A jej = Tej for each j, which implies that 5 2 = T. 
Thus 5 is a positive square root of T, and hence (c) holds. 

Clearly (c) implies (d) (because, by definition, every positive operator 
is self-adjoint). 

Now suppose that (d) holds, meaning that there exists a self-adjoint 
operator 5 on V such that T = 5 2 . Then T = 5*5 (because 5* = 5), and 
hence (e) holds. 

Finally, suppose that (e) holds. Let 5 G L(V ) be such that T = 5*5. 
Then T* = (5*5)* = 5* (5*)* = 5*5 = T, and hence T is self-adjoint. 
To complete the proof that (a) holds, note that 

(Tv,v) = (S*Sv,v) 

= (Sv,Sv) 

>0 

for every v G V. Thus T is positive. ■ 

Each nonnegative number has a unique nonnegative square root. 
The next proposition shows that positive operators enjoy a similar 
property. Because of this proposition, we can use the notation y/T 
to denote the unique positive square root of a positive operator T, just 
as VA denotes the unique nonnegative square root of a nonnegative 
number A. 


A positive operator can 
have infinitely many 
square roots (though 
only one of them can 
be positive). For 
example, the identity 
operator on V has 
infinitely many square 
roots if dim V > 1. 


7.28 Proposition: Every positive operator on V has a unique positive 
square root. 

Proof: Suppose T g £(V) is positive. Let Ai,...,A m denote the 
distinct eigenvalues of T ; because T is positive, all these numbers are 
nonnegative (by 7.27). Because T is self-adjoint, we have 

7.29 V = null(T - Ai I) ® ■ ■ ■ © null(T - A, n /); 
see 7.14. 
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Now suppose 5 E L(V) is a positive square root of T. Suppose a is 
an eigenvalue of S. If v e null(5 - al), then Sv = av, which implies 
that 

7.30 Tv = S 2 v = a 2 v, 

so v G nil] hr - a 2 1). Thus a 2 is an eigenvalue of T, which means 
that a 2 must equal some A y. In other words, a = JAj for some j. 
Furthermore, 7.30 implies that 

7.31 null(5 - ^AjI) c null(T - Ay/). 

In the paragraph above, we showed that the only possible eigenval¬ 
ues for S are V^i, ■ ■ ■, \!A m . Because S is self-adjoint, this implies that 

7.32 V = null(5 - a /X 7/) © ■ ■ ■ © null(S - a/a^/); 
see 7.14. Now 7.29, 7.32, and 7.31 imply that 

null(S - a/AjD = null(T - Ay/) 

for each j. In other words, on null(T - Ay/), the operator S is just 
multiplication by JXy. Thus 5, the positive square root of T, is uniquely 
determined by T. m 


Isometries 


An operator 5 © L(V) is called an isometry if 

lisvll = IMI 

for all v gV. In other words, an operator is an isometry if it preserves 
norms. For example, A/ is an isometry whenever A © F satisfies |A| = 1. 
More generally, suppose Ai,..., A n are scalars with absolute value 1 and 
S G L(V ) satisfies S(ey) = Ayey for some orthonormal basis (e \,..., e n ) 
of V. Suppose v G V. Then 


The Greek word isos 
means equal; the Greek 
word metron means 
measure. Thus 
isometry literally 
means equal measure. 


7.33 


v = (v,ei)ei + ■ ■ ■ + (v, e n )e n 


and 

7.34 IM | 2 = |<v , ei >| 2 + ■ ■ ■ + | (v, e M ) | 2 , 
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where we have used 6.17. Applying 5 to both sides of 7.33 gives 


An isometry on a real 
inner-product space is 
often called an 
orthogonal operator. 

Am isometry on a 
complex inner-product 
space is often called a 
unitary operator. We 
will use the term 
isometry so that our 
results can apply to 
both real and complex 
inner-product spaces. 


Sv = {v,e x )Sei + ■ ■ ■ + (v,e n )Se n 
= Ai(v,ei)ei + ■ ■ ■ + A n (v,e n )e n . 

The last equation, along with the equation | A j \ = 1, shows that 

7.35 HSvIl 2 = |(v, ei )| 2 + ■ ■ ■ + \(v,e n )\ 2 . 

Comparing 7.34 and 7.35 shows that ||v|| = ||5v||. In other words, 5 is 
an isometry. 

For another example, let OeR. Then the operator on R 2 of coun¬ 
terclockwise rotation (centered at the origin) by an angle of 9 is an 
isometry (you should find the matrix of this operator with respect to 
the standard basis of R 2 ). 

If 5 e L(V) is an isometry, then 5 is injective (because if Sv = 0, 
then ||v|| = ||5v|| = 0, and hence v = 0). Thus every isometry is 
invertible (by 3.21). 

The next theorem provides several conditions that are equivalent 
to being an isometry. These equivalences have several important in¬ 
terpretations. In particular, the equivalence of (a) and (b) shows that 
an isometry preserves inner products. Because (a) implies (d), we see 
that if S is an isometry and (ei ,..., e n ) is an orthonormal basis of V, 
then the columns of the matrix of S (with respect to this basis) are or¬ 
thonormal; because (e) implies (a), we see that the converse also holds. 
Because (a) is equivalent to conditions (i) and (j), we see that in the last 
sentence we can replace “columns” with “rows”. 

7.36 Theorem: Suppose S e £(V). Then the following are equiva¬ 
lent: 

(a) 5 is an isometry; 

(b) ( Su,Sv) = (u,v) for all u,v e V; 

(c) 5*5 = I; 

(d) (5fii,..., Se n ) is orthonormal whenever (ei ,..., e n ) is an ortho¬ 
normal list of vectors in V; 

(e) there exists an orthonormal basis (ei,...,e n ) of V such that 
(5fii,..., 5e n ) is orthonormal; 

(f) 5* is an isometry; 
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(g) (S*u,S*v) = {u,v) for all u,v e V; 

(h) 55* = I; 

(i) (5*ei,..., S*e n ) is orthonormal whenever (e \,..., e n ) is an or¬ 
thonormal list of vectors in V; 

(j) there exists an orthonormal basis (e\,, e n ) of V such that 
(5*e i,... , 5*e„) is orthonormal. 

Proof: First suppose that (a) holds. If V is a real inner-product 
space, then for every u, v G V we have 

( Su,Sv ) = (||5u. + 5v|| 2 - ||5u - 5v|| 2 )/4 
= (||5(u + v)|| 2 - ||5 (u - v)|| 2 )/4 
= (||u + v|| 2 - ||u - v|| 2 )/4 
= (u,v), 


where the first equality comes from Exercise 6 in Chapter 6, the second 
equality comes from the linearity of 5, the third equality holds because 
5 is an isometry, and the last equality again comes from Exercise 6 in 
Chapter 6. If V is a complex inner-product space, then use Exercise 7 
in Chapter 6 instead of Exercise 6 to obtain the same conclusion. In 
either case, we see that (a) implies (b). 

Now suppose that (b) holds. Then 

((5*5 - I)u,v) = ( Su,Sv ) - (u,v) 

= 0 

for every u, v e V. Taking v = (5*5 - I)u, we see that 5*5 -1 = 0. 
Hence 5*5 = I, proving that (b) implies (c). 

Now suppose that (c) holds. Suppose (ei,..., e n ) is an orthonormal 
list of vectors in V. Then 

(Sej,Se k ) = (S*Sej,e k ) 

= ( ej,e k ). 

Hence (5e i,..., Se n ) is orthonormal, proving that (c) implies (d). 
Obviously (d) implies (e). 

Now suppose (e) holds. Let (e k ,..., e n ) be an orthonormal basis of V 
such that (5ei,..., Se n ) is orthonormal. If v e V, then 
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Il5v|| 2 = ||5((v,ei)ei + ■ ■ ■ + (v,e n )e„)|| 2 
= ||<v,ei)Sei + ■ ■ ■ + {v,e n )Se n \\ 2 
= \{v,ei)\ 2 + ■ ■ ■ + \{v,e n )\ 2 
= IMI 2 , 

where the hrst and last equalities come from 6.17. Taking square roots, 
we see that 5 is an isometry, proving that (e) implies (a). 

Having shown that (a) => (b) => (c) => (d) => (e) => (a), we know at this 
stage that (a) through (e) are all equivalent to each other. Replacing 5 
with 5*, we see that (f) through (j) are all equivalent to each other. Thus 
to complete the proof, we need only show that one of the conditions 
in the group (a) through (e) is equivalent to one of the conditions in 
the group (f) through (j). The easiest way to connect the two groups of 
conditions is to show that (c) is equivalent to (h). In general, of course, 
5 need not commute with 5*. However, 5*5 = J if and only if 55* = /; 
this is a special case of Exercise 23 in Chapter 3. Thus (c) is equivalent 
to (h), completing the proof. ■ 

The last theorem shows that every isometry is normal (see (a), (c), 
and (h) of 7.36). Thus the characterizations of normal operators can 
be used to give complete descriptions of isometries. We do this in the 
next two theorems. 

7.37 Theorem: Suppose V is a complex inner-product space and 
5 e £(V). Then 5 is an isometry if and only if there is an orthonormal 
basis of V consisting of eigenvectors of 5 all of whose corresponding 
eigenvalues have absolute value 1. 

Proof: We already proved (see the hrst paragraph of this section) 
that if there is an orthonormal basis of V consisting of eigenvectors of 5 
all of whose eigenvalues have absolute value 1, then 5 is an isometry. 

To prove the other direction, suppose 5 is an isometry. By the com¬ 
plex spectral theorem (7.9), there is an orthonormal basis (ei,..., e n ) 
of V consisting of eigenvectors of 5. For j G {1,... ,n}, let Aj be the 
eigenvalue corresponding to e ; . Then 

I Aj | = IIAjg.,-11 = WSejW = WejW = 1. 

Thus each eigenvalue of 5 has absolute value 1, completing the proof. ■ 
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If d e R, then the operator on R 2 of counterclockwise rotation (cen¬ 
tered at the origin) by an angle of d has matrix 7.39 with respect to 
the standard basis, as you should verify. The next result states that ev¬ 
ery isometry on a real inner-product space is composed of pieces that 
look like rotations on two-dimensional subspaces, pieces that equal the 
identity operator, and pieces that equal multiplication by -1. 

This theorem implies 
that an isometry on an 
odd-dimensional real 
inner-product space 
must have 1 or -1 as 
an eigenvalue. 


with 9 e (0, tt). 


7.38 Theorem: Suppose that V is a real inner-product space and 
S e £(V). Then S is an isometry if and only if there is an orthonormal 
basis of V with respect to which S has a block diagonal matrix where 
each block on the diagonal is a 1-by-l matrix containing 1 or -1 or a 
2-by-2 matrix of the form 

_ _ cos 6 -sind 

7 39 

sind cosd 


Proof: First suppose that S is an isometry. Because S is normal, 
there is an orthonormal basis of V such that with respect to this basis 
S has a block diagonal matrix, where each block is a 1-by-l matrix or a 
2-by-2 matrix of the form 



with b > 0 (see 7.25). 

If A is an entry in a 1-by-l along the diagonal of the matrix of S (with 
respect to the basis mentioned above), then there is a basis vector ej 
such that Sej = A ej. Because S is an isometry, this implies that |A| = 1. 
Thus A=lorA = -l because these are the only real numbers with 
absolute value 1. 

Now consider a 2-by-2 matrix of the form 7.40 along the diagonal of 
the matrix of 5. There are basis vectors ej, ej+i such that 

Sej = ctej + bej+i. 

Thus 

1 = \\ej\\ 2 = WSejf = a 2 + b 2 . 

The equation above, along with the condition b > 0, implies that there 
exists a number d G (0, tt) such that a = cos d and b = sin d. Thus the 
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matrix 7.40 has the required form 7.39, completing the proof in this 
direction. 

Conversely, now suppose that there is an orthonormal basis of V 
with respect to which the matrix of S has the form required by the 
theorem. Thus there is a direct sum decomposition 

V = Ui © ■ ■ ■ ® U m , 

where each Uj is a subspace of V of dimension 1 or 2. Furthermore, 
any two vectors belonging to distinct U’s are orthogonal, and each S | 
is an isometry mapping Uj into Uj. If v e V, we can write 

V = U !+■■■+ U m , 

where each uj e Uj. Applying 5 to the equation above and then taking 
norms gives 

||Sv|| 2 = ||Swi + ■ ■ ■ + Su m || 2 

= HSuiH 2 + ■ ■ ■ + \\Su m || 2 
= \\ui II 2 + ■ ■ ■ + \\u m II 2 
= IMI 2 . 

Thus 5 is an isometry, as desired. ■ 

Tolar andSmgular-dalue T)ecompositums 

Recall our analogy between C and £(V). Under this analogy, a com¬ 
plex number z corresponds to an operator T, and z corresponds to T*. 
The real numbers correspond to the self-adjoint operators, and the non¬ 
negative numbers correspond to the (badly named) positive operators. 
Another distinguished subset of C is the unit circle, which consists of 
the complex numbers z such that |z| = 1. The condition |z| = 1 is 
equivalent to the condition zz = 1. Under our analogy, this would cor¬ 
respond to the condition T*T = I, which is equivalent to T being an 
isometry (see 7.36). In other words, the unit circle in C corresponds to 
the isometries. 

Continuing with our analogy, note that each complex number z ex¬ 
cept 0 can be written in the form 
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where the first factor, namely, z/1 z |, is an element of the unit circle. Our 
analogy leads us to guess that any operator T G £(V) can be written 
as an isometry times V T*T. That guess is indeed correct, as we now 
prove. 


7.41 Polar Decomposition: If T G £(V), then there exists an isom¬ 

If you know a bit of 

etry S G £(V) such that 

complex analysis, you 

t = sVt*t. 

will recognize the 

Proof: Suppose T e £(V). If v e V, then 

analogy to polar 
coordinates for 

II Tv || 2 = (Tv,Tv) 

complex numbers: 
every complex number 

= (T*Tv,v) 

can be written in the 

= (\/T*T^/T*Tv,v) 

form e°'r, where 

= (VT*Tv, VT*Tv) 

0 G [0, 2tt) and r > 0. 
Note that e° l is in the 

= WVT^TvW 2 . 

unit circle, 

Thus 

7.42 || Tv || = ||VT*Tv|| 

corresponding to S 
being an isometry', and 
r is nonnegative, 

for all v G V. 

corresponding to 

VT*T being a positive 

Dehne a linear map Si: range VT* T — range T by 

operator. 


7.43 Si(VT*Tv) = Tv. 

The idea of the proof is to extend Si to an isometry S G £(V) such that 
T = S yT*T. Now for the details. 

First we must check that Si is well defined. To do this, suppose 
Vi, V2 G V are such that VT*Tv i = VT*Tv 2 . For the definition given 
by 7.43 to make sense, we must show that Tvi = Tv 2 . ffowever, 

lirvi-rv 2 || = iit(vi-v 2 )|| 

= ||VT*T(vi - v 2 )|| 

= ||>/f*Tvi - v / T*Tv 2 || 

= 0, 

where the second equality holds by 7.42. The equation above shows 
that Tv i = Tv>, so S| is indeed well defined. You should verify that Si 
is a linear map. 
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In the rest of the proof 
all we are doing is 
extending Si to an 
isometry S on all of V. 
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We see from 7.43 that Si maps range yT*7 onto range T. Clearly 
7.42 and 7.43 imply that ||Siu|| = ||u|| for all u e range ^/T*T. In 
particular, Si is injective. Thus from 3.4, applied to Si, we have 

dim range v'T*T = dim range T. 

This implies that dim (range \JT*T) L = dim(range T) L (see Exercise 15 
in Chapter 6). Thus orthonormal bases (e\,e m ) of (range -JT*T) L 
and (/i,..., fm) of (range T) L can be chosen; the key point here is that 
these two orthonormal bases have the same length. Define a linear map 
S 2 : (range ~JT*T) L — (range T) L by 

S 2 (ciiei + ■ ■ ■ + a m e m ) = ct\f\ + ■ ■ ■ + cL m f m . 

Obviously ||S 2 w|| = ||w|| for all w e (range v , T*T) ± . 

Now let S be the operator on V that equals Si on range yT*7 and 
equals S 2 on (range yT*T) J -. More precisely, recall that each v e V 
can be written uniquely in the form 

7.44 v = u + w, 

where u e range VT*T and w e (range \JT* T)- 1 (see 6.29). For v G V 
with decomposition as above, define Sv by 


Sv = SlU + S 2 TV. 


For each veVwe have 

S(VT*Tv) = Si(VT*Tv) = Tv, 

so T = S yT*T, as desired. All that remains is to show that S is an isom¬ 
etry. However, this follows easily from the two uses of the Pythagorean 
theorem: if v e V has decomposition as in 7.44, then 

||Sv|| 2 = ||SiW + S 2 w|| 2 

= ||Siu|! 2 + ||S 2 w|| 2 
= ||u || 2 + ||w || 2 
= IMI 2 , 


where the second equality above holds because S 1 11 e range 7' and 
S 2 U g (range T) ± . m 
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The polar decomposition (7.41) states that each operator on V is the 
product of an isometry and a positive operator. Thus we can write each 
operator on V as the product of two operators, each of which comes 
from a class that we have completely described and that we under¬ 
stand reasonably well. The isometries are described by 7.37 and 7.38; 
the positive operators (which are all self-adjoint) are described by the 
spectral theorem (7.9 and 7.13). 

Specifically, suppose T = SVT*T is the polar decomposition of 
T G £ ( V ), where S is an isometry. Then there is an orthonormal basis 
of V with respect to which S has a diagonal matrix (if F = C) or a block 
diagonal matrix with blocks of size at most 2-by-2 (if F = R), and there 
is an orthonormal basis of V with respect to which yT*T has a diag¬ 
onal matrix. Warning: there may not exist an orthonormal basis that 
simultaneously puts the matrices of both 5 and yT*T into these nice 
forms (diagonal or block diagonal with small blocks). In other words, S 
may require one orthonormal basis and % JT*T may require a different 
orthonormal basis. 

Suppose T G £(V). The singular values of T are the eigenvalues 
of V 'T* T, with each eigenvalue A repeated dim null(VT*T - A I) times. 
The singular values of T are all nonnegative because they are the eigen¬ 
values of the positive operator yT*7. 

For example, if T G £(F 4 ) is defined by 

7.45 T(zi ,Z 2 ,Z 3 ,Z 4 ) = (0, 3zi, 2Z2,-3Z4), 

then T*T(zi,z 2 ,Z 3 ,z 4 ) = (9zi, 4 z 2 , 0, 9 Z 4 ), as you should verify. Thus 

Vt*T(zi,z 2 ,Z3 ,z 4 ) = (3zi,2z 2 ,0, 3z 4 ), 

and we see that the eigenvalues of yT*T are 3,2,0. Clearly 

dimnulh VT* T-3I) = 2, dimnulh VT*T-2I) = 1, dimnull\/r*r = 1. 

Hence the singular values of T are 3, 3, 2,0. In this example -3 and 0 
are the only eigenvalues of T, as you should verify. 

EachT e £(V) has dim V 7 singular values, as can be seen by applying 
the spectral theorem and 5.21 (see especially part (e)) to the positive 
(hence self-adjoint) operator \/T*T. For example, the operator T de¬ 
fined by 7.45 on the four-dimensional vector space F 4 has four singular 
values (they are 3, 3, 2, 0), as we saw in the previous paragraph. 

The next result shows that every operator on V has a nice descrip¬ 
tion in terms of its singular values and two orthonormal bases of V. 
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7.46 Singular-Value Decomposition: Suppose T e £(V) has sin¬ 
gular values Si ,..., s n . Then there exist orthonormal bases (e\,..., e n ) 
and (fi,... ,f n ) ofV such that 

7.47 Tv = S\{v,e\)fi + ■ ■ ■ + s n {v,e n )f n 
for every v G V. 

Proof: By the spectral theorem (also see 7.14) applied to y'T*T, 
there is an orthonormal basis (e i,..., e n ) of V such that \/T*T e, = Sj e ; 
for j = 1 ,n. We have 

v = (v,ei)ei + ■ ■ ■ + ( v,e n )e n 

for every v G V (see 6.17). Apply yT*T to both sides of this equation, 
getting 

v"r*7V = s 1 (v,e 1 )e 1 + ■ ■ ■ + s n {v,e n )e n 

for every v G V. By the polar decomposition (see 7.41), there is an 
isometry S G £(V) such that T = SVT*T. Apply 5 to both sides of the 
equation above, getting 

Tv = si(v,ei)Sei + ■ ■ ■ + s n (v,e n )Se n 

for every v G V. For each j, let fj = Sej. Because S is an isometry, 
(/i, ■ ■ ■, fn) is an orthonormal basis of V (see 7.36). The equation above 
now becomes 

Tv = s\{v,e\)f\ + ■ ■ ■ + s n {v,e n )fn 
for every v G V, completing the proof. ■ 

When we worked with linear maps from one vector space to a second 
vector space, we considered the matrix of a linear map with respect 
to a basis for the first vector space and a basis for the second vector 
space. When dealing with operators, which are linear maps from a 
vector space to itself, we almost always use only one basis, making it 
play both roles. 

The singular-value decomposition allows us a rare opportunity to 
use two different bases for the matrix of an operator. To do this, sup¬ 
pose T G £(V). Let 5i,... ,s n denote the singular values of T, and let 
(ei,..., e n ) and (/i,... ,f n ) be orthonormal bases of V such that the 
singular-value decomposition 7.47 holds. Then clearly 


This proof illustrates 
the usefulness of the 
polar decomposition. 
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M(TAei,...,en), (/i,. 


5l 0 

0 S n 


In other words, every operator on V has a diagonal matrix with respect 
to some orthonormal bases of V, provided that we are permitted to 
use two different bases rather than a single basis as customary when 
working with operators. 

Singular values and the singular-value decomposition have many ap¬ 
plications (some are given in the exercises), including applications in 
computational linear algebra. To compute numeric approximations to 
the singular values of an operator T, first compute T*T and then com¬ 
pute approximations to the eigenvalues of T*T (good techniques exist 
for approximating eigenvalues of positive operators). The nonnegative 
square roots of these (approximate) eigenvalues of T* T will be the (ap¬ 
proximate) singular values of T (as can be seen from the proof of 7.28). 
In other words, the singular values of T can be approximated without 
computing the square root of T* T. 
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Txercises 

1. Make 2*2 (R) into an inner-product space by defining 

( p,q ) = [ pix)q(x) dx. 

Jo 

Define T G £(T 2 ( R)) by T(ao + a\x + a 2 X 2 ) = a\x. 

(a) Show that T is not self-adjoint. 

(b) The matrix of T with respect to the basis (1, x, x 2 ) is 

~ 0 0 0 " 

0 10 . 

0 0 0 

This matrix equals its conjugate transpose, even though T 
is not self-adjoint. Explain why this is not a contradiction. 

2. Prove or give a counterexample: the product of any two self- 
adjoint operators on a finite-dimensional inner-product space is 
self-adjoint. 

3. (a) Show that if V is a real inner-product space, then the set 

of self-adjoint operators on V is a subspace of £(V). 

(b) Show that if V is a complex inner-product space, then the 
set of self-adjoint operators on V is not a subspace of 
£{V). 

4. Suppose P G £(V) is such that P 2 = P. Prove that P is an orthog¬ 
onal projection if and only if P is self-adjoint. 

5. Show that if dim V > 2, then the set of normal operators on V is 
not a subspace of £(V). 

6. Prove that if T G £(V ) is normal, then 

range T = range T*. 

7. Prove that if T G £(V) is normal, then 

null T k = null T and range T k = range T 


for every positive integer k. 
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8. Prove that there does not exist a self-adjoint operator T e £(R 3 ) 
such that T{ 1,2, 3) = (0, 0, 0) and T( 2, 5, 7) = (2, 5, 7). 

9. Prove that a normal operator on a complex inner-product space 
is self-adjoint if and only if all its eigenvalues are real. 

10. Suppose V is a complex inner-product space and T G £(V) is a 
normal operator such that T 9 = T 8 . Prove that T is self-adjoint 
and T 2 = T. 

11. Suppose V is a complex inner-product space. Prove that every 
normal operator on V has a square root. (An operator S G £(V) 
is called a square root of T G £{V) if 5 2 = T.) 

12. Give an example of a real inner-product space V and T G £(V) 
and real numbers a, /i with a 2 < 4/3 such that T 2 + aT + fil is 
not invertible. 

13. Prove or give a counterexample: every self-adjoint operator on 
V has a cube root. (An operator S G £(V) is called a cube root 
of T G L(V) if S 3 = T.) 

14. Suppose T G £(V) is self-adjoint, AeF, and e > 0. Prove that if 
there exists v G V such that ||v|| = 1 and 

||Tv - Av|| < e, 

then T has an eigenvalue A' such that | A - A' | < e. 

15. Suppose U is a finite-dimensional real vector space and T G 
£(U). Prove that U has a basis consisting of eigenvectors of T if 
and only if there is an inner product on U that makes T into a 
self-adjoint operator. 

16. Give an example of an operator T on an inner product space such 
that T has an invariant subspace whose orthogonal complement 
is not invariant under T. 

17. Prove that the sum of any two positive operators on V is positive. 

18. Prove that if T G £(V) is positive, then so is T k for every positive 
integer k. 
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Exercise 9 strengthens 
the analogy (for normal 
operators) between 
self-adjoint operators 
and real numbers. 


This exercise shows 
that the hypothesis 
that T is self-adjoint is 
needed in 7.11, even 
for real vector spaces. 


This exercise shows 
that 7.18 can fail 
without the hypothesis 
that T is normal. 
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19. Suppose that T is a positive operator on V. Prove that T is in¬ 
vertible if and only if 

(Tv,v) >0 

for every v G V \ {0}. 

20. Prove or disprove: the identity operator on F 2 has infinitely many 
self-adjoint square roots. 

21. Prove or give a counterexample: if S G £(V) and there exists 
an orthonormal basis (e\, ...,e n ) of V such that ||Se/|| = 1 for 
each ej, then S is an isometry. 

2 2. Prove that if S G £ (R 3 ) is an isometry, then there exists a nonzero 

vector xeR 3 such that S 2 x = x. 

23. Define T e £(F 3 ) by 

r(zi,z 2 ,z 3 ) = (z 3 ,2zi,3z 2 ). 

Find (explicitly) an isometry 5 e X(F 3 ) such that T = SVT*T. 

24. Suppose T G £(V), S e L(V) is an isometry, and R e £(V) is a 
positive operator such that T = SR. Prove that R = sjT*T. 

25. Suppose T G £(V). Prove that T is invertible if and only if there 
exists a unique isometry S e £(V) such that T = S^/T*T. 

26. Prove that if T g £(V) is self-adjoint, then the singular values 
of T equal the absolute values of the eigenvalues of T (repeated 
appropriately). 

27. Prove or give a counterexample: if T e £(V), then the singular 
values of T 2 equal the squares of the singular values of T. 

28. Suppose T e £(V). Prove that T is invertible if and only if 0 is 
not a singular value of T. 

29. Suppose T G £(V). Prove that dimrange T equals the number of 
nonzero singular values of T. 

30. Suppose S G £(V). Prove that S is an isometry if and only if all 
the singular values of S equal 1. 


Exercise 24 shows that 
if we write T as the 
product of an isometry 
and a positive operator 
(as in the polar 
decomposition), then 
the positive operator 
must equal £T*T. 
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31. Suppose Ti,T 2 <e L(V). Prove that T\ and T 2 have the same 
singular values if and only if there exist isometries Si, S 2 G L (V) 
such that T\ = SiT 2 S 2 . 

32. Suppose T G L(V) has singular-value decomposition given by 


Tv = 5i(v,ei)/i + ■ ■ ■ + s n {v,e n )f n 

for every v G V, where si,... ,s n are the singular values of T and 
(fii,..., e n ) and (fi,...,f n ) are orthonormal bases of V. 

(a) Prove that 

T*v = Si{v,fi)ei + ■ ■ ■ + s n (v,f n )e n 
for every v G V. 

(b) Prove that if T is invertible, then 

T _i = (v,/i)ei _ _ _ {v,fn)e n 

5l Syi 

for every v e V. 

33. Suppose T G L(V). Let s denote the smallest singular value of T, 
and let s denote the largest singular value of T. Prove that 

s|M| < ||7V|| < 51|v|| 


for every v G V. 

34. Suppose T', T" G £(V). Let s' denote the largest singular value 
of T', let s" denote the largest singular value of T", and let s 
denote the largest singular value of T' + T". Prove that 5 < s'+s". 



Chapter 8 


Operators on 
Complex Vector Spaces 


In this chapter we delve deeper into the structure of operators on 
complex vector spaces. An inner product does not help with this ma¬ 
terial, so we return to the general setting of a finite-dimensional vector 
space (as opposed to the more specialized context of an inner-product 
space). Thus our assumptions for this chapter are as follows: 

Recall that F denotes R or C. 

Also, V is a finite-dimensional, nonzero vector space over F. 

Some of the results in this chapter are valid on real vector spaces, 
so we have not assumed that V is a complex vector space. Most of the 
results in this chapter that are proved only for complex vector spaces 
have analogous results on real vector spaces that are proved in the next 
chapter. We deal with complex vector spaces first because the proofs 
on complex vector spaces are often simpler than the analogous proofs 
on real vector spaces. 
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(generadzecC Eigenvectors 

Unfortunately some operators do not have enough eigenvectors to 
lead to a good description. Thus in this section we introduce the con¬ 
cept of generalized eigenvectors, which will play a major role in our 
description of the structure of an operator. 

To understand why we need more than eigenvectors, let’s examine 
the question of describing an operator by decomposing its domain into 
invariant subspaces. Fix T G £(V). We seek to describe T by finding a 
“nice” direct sum decomposition 

8.1 V = Ui ® ■ ■ ■ ® U m , 

where each Uj is a subspace of V invariant under T. The simplest pos¬ 
sible nonzero invariant subspaces are one-dimensional. A decompo¬ 
sition 8.1 where each Uj is a one-dimensional subspace of V invariant 
under T is possible if and only if V has a basis consisting of eigenvectors 
of T (see 5.21). This happens if and only if V has the decomposition 

8.2 V = nulKT - Ail) ® ■ ■ ■ ® null(T - A m I), 

where Ai,..., A m are the distinct eigenvalues of T (see 5.21). 

In the last chapter we showed that a decomposition of the form 

8.2 holds for every self-adjoint operator on an inner-product space 
(see 7.14). Sadly, a decomposition of the form 8.2 may not hold for 
more general operators, even on a complex vector space. An exam¬ 
ple was given by the operator in 5.19, which does not have enough 
eigenvectors for 8.2 to hold. Generalized eigenvectors, which we now 
introduce, will remedy this situation. Our main goal in this chapter is 
to show that if V is a complex vector space and T G £(V), then 

V = nulKT - AiJ) dimv © ■ ■ ■ ® nulKT - A m /) dimV , 

where Ai,..., A m are the distinct eigenvalues of T (see 8.23). 

Suppose T G £(V) and A is an eigenvalue of T. A vector v £ Vis 
called a generalized eigenvector of T corresponding to A if 

8.3 (T-AI) j v = 0 

for some positive integer j. Note that every eigenvector of T is a gen¬ 
eralized eigenvector of T (take j = 1 in the equation above), but the 
converse is not true. For example, if T G £(C 3 ) is defined by 
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T(zi,z 2 ,z 3 ) = (z 2 ,0,z 3 ), 

then r 2 (z!,z 2 ,0) = 0 for all zi,z 2 G C. Hence every element of C 3 
whose last coordinate equals 0 is a generalized eigenvector of T. As 
you should verify, 

C 3 = {(zi,z 2 ,0) : zi,z 2 G C} © { (0,0,z 3 ) : z 3 G C}, 

where the hrst subspace on the right equals the set of generalized eigen¬ 
vectors for this operator corresponding to the eigenvalue 0 and the sec¬ 
ond subspace on the right equals the set of generalized eigenvectors 
corresponding to the eigenvalue 1. Later in this chapter we will prove 
that a decomposition using generalized eigenvectors exists for every 
operator on a complex vector space (see 8.23). 

Though j is allowed to be an arbitrary integer in the definition of a 
generalized eigenvector, we will soon see that every generalized eigen¬ 
vector satisfies an equation of the form 8.3 with j equal to the dimen¬ 
sion of V. To prove this, we now turn to a study of null spaces of 
powers of an operator. 

Suppose T G £(V) and k is a nonnegative integer. If T k v = 0, then 
T k+1 v = T(T k v) = T( 0) = 0. Thus null T k c nullT fc+1 . In other words, 
we have 

8.4 {0} = null T° c null T l c ■ ■ ■ c null T k c null T k+1 c ■ ■ ■ . 

The next proposition says that once two consecutive terms in this se¬ 
quence of subspaces are equal, then all later terms in the sequence are 
equal. 

8.5 Proposition: If T G £(V) and m is a nonnegative integer such 
that nullT m = nullT m+1 , then 

null T° c null T 1 c ■ ■ ■ c null T m = null T m+1 = null T m+2 

Proof: Suppose T g £( V ) and m is a nonnegative integer such 
that null T m = null T m+1 . Let k be a positive integer. We want to prove 
that 

nullT m+fc = nullT m+fc+1 . 

We already know that null T m+k c null T m+k+1 . To prove the inclusion 
in the other direction, suppose that v G nullT m+fe+1 . Then 


Note that we do not 
define the concept of a 
generalized eigenvalue 
because this would not 
lead to anything new. 
Reason: if (T - M)’ is 
not injective for some 
positive integer j, then 
T - \I is not injective, 
and hence A is an 
eigenvalue of T. 



166 


Chapter 8. Operators on Complex Vector Spaces 


This corollary implies 
that the set of 
generalized 
eigenvectors of 
T e £(V) 
corresponding to an 
eigenvalue A is a 
subspace of V. 


q = jm+k+ly = jm+l ( T ky) 

Hence 

T k v e null T m+1 = null T m . 

Thus 0 = T m (T k v) = T m+k v, which means that v e nullT ,n+fc . This 
implies that null T m+k+1 c null T m+fc , completing the proof. ■ 

The proposition above raises the question of whether there must ex¬ 
ist a nonnegative integer m such that null T m = null T m+1 . The propo¬ 
sition below shows that this equality holds at least when m equals the 
dimension of the vector space on which T operates. 

8.6 Proposition: If T e £(V), then 

null T dirn 1 = nullT dimV+1 = nullT dimV+2 

Proof: Suppose T e £(V). To get our desired conclusion, we need 
only prove that null T d " TlV = null T dir " V+1 (by 8.5). Suppose this is not 
true. Then, by 8.5, we have 

{0} = nullT 0 <= null 7 1 s . . . c null T dimV £ null r dimV+1 , 

where the symbol c means “contained in but not equal to”. At each of 
the strict inclusions in the chain above, the dimension must increase by 
at least 1. Thus dim null T dimV+1 > dim V + 1, a contradiction because 
a subspace of V cannot have a larger dimension than dim V. m 

Now we have the promised description of generalized eigenvectors. 

8.7 Corollary: Suppose T e £(V) and A is an eigenvalue of T. Then 
the set of generalized eigenvectors of T corresponding to A equals 
nulKT- A/) dimy . 

Proof: If v e nulKT - AI ) dirn v , then clearly v is a generalized 
eigenvector of T corresponding to A (by the definition of generalized 
eigenvector). 

Conversely, suppose that v e V is a generalized eigenvector of T 
corresponding to A. Thus there is a positive integer j such that 

v e null(T - AI) J . 

From 8.5 and 8.6 (with T - AI replacing T), we get v e null(T - A/) dim ' / , 
as desired. ■ 
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An operator is called nilpotent if some power of it equals 0. For 
example, the operator N e £(F 4 ) defined by 

jV(Zi,Z 2 ,Z 3 ,Z 4 ) = (z 3 ,z 4 ,0,0) 

is nilpotent because N 2 = 0. As another example, the operator of dif¬ 
ferentiation on T m ( R) is nilpotent because the (in + l) st derivative of 
any polynomial of degree at most m equals 0. Note that on this space of 
dimension m + 1, we need to raise the nilpotent operator to the power 
m + 1 to get 0. The next corollary shows that we never need to use a 
power higher than the dimension of the space. 

8.8 Corollary: Suppose N e £(V) is nilpotent. Then N dimV = 0. 

Proof: Because N is nilpotent, every vector in V is a generalized 
eigenvector corresponding to the eigenvalue 0. Thus from 8.7 we see 
that nullAT dimV = V, as desired. ■ 

Having dealt with null spaces of powers of operators, we now turn 
our attention to ranges. Suppose T e £(V) and k is a nonnegative 
integer. If w e range T k+1 , then there exists v e V with 

w = T k+1 v = T k (Tv) e range T k . 

Thus range T fc+1 c range T k . In other words, we have 

V = range T° d range T 1 D ■ ■ ■ D range T k D range T k+1 D ■ ■ ■ . 

The proposition below shows that the inclusions above become equal¬ 
ities once the power reaches the dimension of V. 

8.9 Proposition: If T e L(V), then 

range T dim ' = range T dimV+1 = range T dim ' /+2 = ■ ■ ■ . 

Proof: We could prove this from scratch, but instead let’s make use 
of the corresponding result already proved for null spaces. Suppose 
m > dim V. Then 

dim range T m = dim V 7 - dim null T m 

= dim V - dim null T dim v 
= dim range T dim ' , 


The Latin word nil 
means nothing or zero; 
the Latin word potent 
means power. Thus 
nilpotent literally 
means zero power. 


These inclusions go in 
the opposite direction 
from the corresponding 
inclusions for null 
spaces (8.4). 
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where the first and third equalities come from 3.4 and the second equal¬ 
ity comes from 8.6. We already know that range T dimV o range T m . We 
just showed that dimrange T dim1 = dimrange T m , so this implies that 
range T dimV = range T m , as desired. ■ 


The Characteristic TotynoiniaC 


If T happens to have a 
diagonal matrix A with 
respect to some basis, 
then A appears on the 
diagonal of A precisely 
dim null(T - A/) times, 
as you should verify. 


Suppose V is a complex vector space and T G £(V). We know that 
V has a basis with respect to which T has an upper-triangular matrix 
(see 5.13). In general, this matrix is not unique—V may have many 
different bases with respect to which T has an upper-triangular matrix, 
and with respect to these different bases we may get different upper- 
triangular matrices. However, the diagonal of any such matrix must 
contain precisely the eigenvalues of T (see 5.18). Thus if T has dim V 
distinct eigenvalues, then each one must appear exactly once on the 
diagonal of any upper-triangular matrix of T. 

What if T has fewer than dim V distinct eigenvalues, as can easily 
happen? Then each eigenvalue must appear at least once on the diag¬ 
onal of any upper-triangular matrix of T, but some of them must be 
repeated. Could the number of times that a particular eigenvalue is 
repeated depend on which basis of V we choose? 

You might guess that a number A appears on the diagonal of an 
upper-triangular matrix of T precisely dim null (T - A I) times. In gen¬ 
eral, this is false. For example, consider the operator on C 2 whose 
matrix with respect to the standard basis is the upper-triangular matrix 

" 51 " 

0 5 ' 


For this operator, dimnulKT - 5/) = 1 but 5 appears on the diago¬ 
nal twice. Note, however, that dimnulKT - SI) 2 = 2 for this oper¬ 
ator. This example illustrates the general situation—a number A ap¬ 
pears on the diagonal of an upper-triangular matrix of T precisely 
dimnulKT - AJ) dimV times, as we will show in the following theorem. 
Because null(T - AJ) dimV depends only on T and A and not on a choice 
of basis, this implies that the number of times an eigenvalue is repeated 
on the diagonal of an upper-triangular matrix of T is independent of 
which particular basis we choose. This result will be our key tool in 
analyzing the structure of an operator on a complex vector space. 
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8.10 Theorem: Let T e £(V) and AeF. Then for every basis of V 
with respect to which T has an upper-triangular matrix, A appears on 
the diagonal of the matrix of T precisely dim null (T - AJ) dimV times. 


Proof: We will assume, without loss of generality, that A = 0 (once 
the theorem is proved in this case, the general case is obtained by re¬ 
placing T with T - A I). 

For convenience let n = dim V. We will prove this theorem by induc¬ 
tion on n . Clearly the desired result holds if n = 1. Thus we can assume 
that n > 1 and that the desired result holds on spaces of dimension 
n - 1. 

Suppose (vi,..., v n ) is a basis of V with respect to which T has an 
upper-triangular matrix 


8.11 


Ai * 


An-l 

0 A n 


Let U = span(vi,..., v n _i). Clearly U is invariant under T (see 5.12), 
and the matrix of T\u with respect to the basis (vi,... ,v n -i) is 


8.12 


Ai * 

0 Aft_i 


Thus, by our induction hypothesis, 0 appears on the diagonal of 8.12 
dimnull(T|u) M_1 times. We know that null(r|u) M_1 = nulKTjf/)” (be¬ 
cause U has dimension n - 1; see 8.6). Hence 


8.1 3 0 appears on the diagonal of 8.12 dim null(T|[/) n times. 

The proof breaks into two cases, depending on whether A n = 0. First 
consider the case where A n f 0. We will show that in this case 


8.14 


null T n c U. 


Once this has been verified, we will know that null T n = null( T\u) n , and 
hence 8.13 will tell us that 0 appears on the diagonal of 8.11 exactly 
dimnull T n times, completing the proof in the case where A n f 0. 
Because fM(T) is given by 8.11, we have 


Recall that an asterisk 
is often used in 
matrices to denote 
entries that we do not 
know or care about. 
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M(T n ) = M(T) n = 


An-i 


An 


This shows that 

T n v n =u + A n n v n 

for some u e U. To prove 8.14 (still assuming that 4 1 0), suppose 
v e null T n . We can write v in the form 


v = u + av n , 

where u e U and a e F. Thus 

0 = T n v = T n u + aT n v n = T n u + au + aA n n v n . 

Because T n u and au are in U and v n $ U, this implies that aA n n = 0. 
However, A n ^ 0, so a = 0. Thus v = u g U, completing the proof 
of 8.14. 

Now consider the case where A n = 0. In this case we will show that 

8.1 5 dimnulir' 1 = dim null (r|[/) n + 1, 

which along with 8.13 will complete the proof when A n = 0. 

Using the formula for the dimension of the sum of two subspaces 
(2.18), we have 

dim null T n = dim (U n null T n ) + dim (U + null T n ) - dimU 
= dimnull(T|[/) n + dim(U + nullT") - (w - 1). 

Suppose we can prove that null T n contains a vector not in U. Then 

n = dim V > dim(U + null T n ) > dim(7 = n - 1, 

which implies that dim(U + null T n ) = n, which when combined with 
the formula above for dim null T n gives 8.15, as desired. Thus to com¬ 
plete the proof, we need only show that null T n contains a vector not 
in U. 

Let’s think about how we might find a vector in null T n that is not 
in U. We might try a vector of the form 


U-V; 
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where u G U. At least we are guaranteed that any such vector is not 
in U. Can we choose u G U such that the vector above is in null T n ? 
Let’s compute: 

T n (u - v n ) = T n u-T n v n - 

To make the above vector equal 0, we must choose (if possible) u G U 
such that T n u = T n v n . We can do this if T n v n e range(T|[/) M . Because 
8.11 is the matrix of T with respect to (vi, ...,v n ), we see that Tv n e U 
(recall that we are considering the case where A rt = 0). Thus 

T n v n = T n -HTv n ) e range(T|[/) n ” 1 = range(T| [/ ) n , 

where the last equality comes from 8.9. In other words, we can indeed 
choose u gU such that u - v n G null T n , completing the proof. ■ 


Suppose T G £(V). The multiplicity of an eigenvalue A of T is de¬ 
fined to be the dimension of the subspace of generalized eigenvectors 
corresponding to A. In other words, the multiplicity of an eigenvalue A 
of T equals dim null(T - A/) dimV . If T has an upper-triangular matrix 
with respect to some basis of V (as always happens when F = C), then 
the multiplicity of A is simply the number of times A appears on the 
diagonal of this matrix (by the last theorem). 

As an example of multiplicity, consider the operator T G £(F 3 ) de¬ 
fined by 

8.16 T(zi,z 2 ,z 3 ) = (0,zi, 5z 3 ). 

You should verify that 0 is an eigenvalue of T with multiplicity 2, that 
5 is an eigenvalue of T with multiplicity 1, and that T has no additional 
eigenvalues. As another example, if T e £(F 3 ) is the operator whose 
matrix is 


Our definition of 
multiplicity has a clear 
connection with the 
geometric behavior 
ofT. Most texts define 
multiplicity in terms of 
the multiplicity of the 
roots of a certain 
polynomial defined by 
determinants. These 
two definitions turn 
out to be equivalent. 


8.17 


' 6 7 7 " 

0 6 7 

0 0 7 


then 6 is an eigenvalue of T with multiplicity 2 and 7 is an eigenvalue 
of T with multiplicity 1 (this follows from the last theorem). 

In each of the examples above, the sum of the multiplicities of the 
eigenvalues of T equals 3, which is the dimension of the domain of T. 
The next proposition shows that this always happens on a complex 
vector space. 
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8.18 Proposition: If V is a complex vector space and T G £(V), then 
the sum of the multiplicities of all the eigenvalues of T equals dim V. 

Proof: Suppose V is a complex vector space and T e £(V). Then 
there is a basis of V with respect to which the matrix of T is upper 
triangular (by 5.13). The multiplicity of A equals the number of times A 
appears on the diagonal of this matrix (from 8.10). Because the diagonal 
of this matrix has length dim V, the sum of the multiplicities of all the 
eigenvalues of T must equal dim V. m 


Suppose V is a complex vector space and T G £(V). Let Ai,..., A m 
denote the distinct eigenvalues of T. Let dj denote the multiplicity 
of A j as an eigenvalue of T. The polynomial 

(z - Ai) dl ... (z - A m ) dm 


Most texts define the 
characteristic 
polynomial using 
determinants. The 
approach taken here, 
which is considerably 
simpler, leads to an 
easy proof of the 
Cayley-Hamil ton 
theorem. 


is called the characteristic polynomial of T. Note that the degree of 
the characteristic polynomial of T equals dim V (from 8.18). Obviously 
the roots of the characteristic polynomial of T equal the eigenvalues 
of T. As an example, the characteristic polynomial of the operator 
T G £(C 3 ) defined by 8.16 equals z 2 (z - 5). 

Here is another description of the characteristic polynomial of an 
operator on a complex vector space. Suppose V is a complex vector 
space and T G £{V). Consider any basis of V with respect to which T 
has an upper-triangular matrix of the form 


8.19 


M(T) 


Aj * 

0 A n 


Then the characteristic polynomial of T is given by 


(z-Ai)...(z-A„); 


this follows immediately from 8.10. As an example of this procedure, 
if T G £(C 3 ) is the operator whose matrix is given by 8.17, then the 
characteristic polynomial of T equals (z - 6) 2 (z - 7). 

In the next chapter we will define the characteristic polynomial of 
an operator on a real vector space and prove that the next result also 
holds for real vector spaces. 
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8.20 Cayley-Hamilton Theorem: Suppose that V is a complex vector 
space and T G £(V). Let q denote the characteristic polynomial of T. 
Then q(T) = 0. 

Proof: Suppose (vi,..., v n ) is a basis of V with respect to which 
the matrix of T has the upper-triangular form 8.19. To prove that 
q(T) = 0, we need only show that q(T)Vj = 0 for j = 1,... ,n. To 
do this, it suffices to show that 

8.21 (T-\ 1 I)...(T-\ j I)v j = 0 

for j = 1.n. 

We will prove 8.21 by induction on j. To get started, suppose j = 1. 
Because M(T, (vi,..., v n )) is given by 8.19, we have Tv i = AiVi, giving 
8.21 when j = 1. 

Now suppose that 1 < j < n and that 


The English 
mathematician Arthur 
Cayley published three 
mathematics papers 
before he completed 
his undergraduate 
degree in 1842. The 
Irish mathematician 
William Hamilton was 
made a professor in 
1827 when he was 22 
years old and still an 
undergraduate! 


0= (T-Ar/JV! 

= (r-AiJ)(T-A 2 J)V2 


= (T-A 1 J)...Cr- Aj_iJ)Vj_i. 
Because M(T, (vi, ..., v n )) is given by 8.19, we see that 


(T-AjI)Vj g span(vi,...,Vj_i). 


Thus, by our induction hypothesis, (T - AiJ) .. AT - Aj_iJ) applied to 
(T-Ajl)Vj gives 0. In other words, 8.21 holds, completing the proof. ■ 

decomposition of an Operator 

We saw earlier that the domain of an operator might not decompose 
into invariant subspaces consisting of eigenvectors of the operator, 
even on a complex vector space. In this section we will see that every 
operator on a complex vector space has enough generalized eigenvec¬ 
tors to provide a decomposition. 

We observed earlier that if T G £(V), then nullT is invariant un¬ 
der T. Now we show that the null space of any polynomial of T is also 
invariant under T. 
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8.22 Proposition: If T G £(V) and p G T( F), then nullp(T) is 
invariant under T. 

Proof: Suppose T e L(V) and p e T( F). Let v e nullp(T). Then 
p(T)v = 0. Thus 

(p(T)HTv) = T(p(T)v) = T( 0) = 0, 

and hence Tv e null p(T). Thus null p(T) is invariant under T, as 
desired. ■ 

The following major structure theorem shows that every operator on 
a complex vector space can be thought of as composed of pieces, each 
of which is a nilpotent operator plus a scalar multiple of the identity. 
Actually we have already done all the hard work, so at this point the 
proof is easy. 

8.23 Theorem: Suppose V is a complex vector space and T G £(V). 

Let Ai,..., A m be the distinct eigenvalues of T, and let U m be 

the corresponding subspaces of generalized eigenvectors. Then 

(a) V = [/i © ■ ■ ■ © U m ; 

(b) each Uj is invariant under T; 

(c) each ( T - A ; 7j | Vj is nilpotent. 

Proof: Note that Uj = null(T - A ,/) dllTI 1 for each j (by 8.7). From 
8.22 (with p(z) = (z - A, ) d " n v j, we get (b). Obviously (c) follows from 
the definitions. 

To prove (a), recall that the multiplicity of A j as an eigenvalue of T 
is defined to be dim Uj . The sum of these multiplicities equals dim V 
(see 8.18); thus 

8.24 dim V = dimf/i + ■ ■ ■ + dimU m - 

Let U = Ui + ■ ■ ■ + U m - Clearly U is invariant under T. Thus we can 
define S G £(U) by 

S = T\ V . 

Note that 5 has the same eigenvalues, with the same multiplicities, as T 
because all the generalized eigenvectors of T are in U, the domain of S. 
Thus applying 8.18 to 5, we get 
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dimf/ = dimf/i + ■ ■ ■ + dimt/ m . 


This equation, along with 8.24, shows that dim V = dimf/. Because U 
is a subspace of V, this implies that V = U. In other words, 

V = Ui + ■ ■ ■ + u m . 

This equation, along with 8.24, allows us to use 2.19 to conclude that 
(a) holds, completing the proof. ■ 

As we know, an operator on a complex vector space may not have 
enough eigenvectors to form a basis for the domain. The next result 
shows that on a complex vector space there are enough generalized 
eigenvectors to do this. 

8.25 Corollary: Suppose V is a complex vector space and T e £(V). 
Then there is a basis of V consisting of generalized eigenvectors of T. 

Proof: Choose a basis for each Uj in 8.23. Put all these bases 
together to form a basis of V consisting of generalized eigenvectors 
of T. m 


Given an operator T on V, we want to find a basis of V so that the 
matrix of T with respect to this basis is as simple as possible, meaning 
that the matrix contains many 0’s. We begin by showing that if N is 
nilpotent, we can choose a basis of V such that the matrix of N with 
respect to this basis has more than half of its entries equal to 0. 


8.26 Lemma: Suppose N is a nilpotent operator on V. Then there is 
a basis of V with respect to which the matrix of N has the form 

r o * i 


here all entries on and below the diagonal are 0’s. 


Proof: First choose a basis of null AT. Then extend this to a basis 
of null Af 2 . Then extend to a basis of null Af 3 . Continue in this fashion, 
eventually getting a basis of V (because null N m = V for m sufficiently 
large). 


If V is complex vector 
space, a proof of this 
lemma follows easily 
from Exercise 6 in this 
chapter, 5.13, and 5.18. 
But the proof given 
here uses simpler ideas 
than needed to prove 
5.13, and it works for 
both real and complex 
vector spaces. 
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Now let’s think about the matrix of N with respect to this basis. The 
first column, and perhaps additional columns at the beginning, consists 
of all 0’s because the corresponding basis vectors are in nulliV. The 
next set of columns comes from basis vectors in null AT 2 . Applying N 
to any such vector, we get a vector in null AT; in other words, we get a 
vector that is a linear combination of the previous basis vectors. Thus 
all nonzero entries in these columns must lie above the diagonal. The 
next set of columns come from basis vectors in null N 3 . Applying N 
to any such vector, we get a vector in null AT 2 ; in other words, we get a 
vector that is a linear combination of the previous basis vectors. Thus, 
once again, all nonzero entries in these columns must lie above the 
diagonal. Continue in this fashion to complete the proof. ■ 


Note that in the next theorem we get many more zeros in the matrix 
of T than are needed to make it upper triangular. 


8.28 Theorem: Suppose V is a complex vector space and T G L(V). 
Let A i,..., A m be the distinct eigenvalues of T. Then there is a basis 
of V with respect to which T has a block diagonal matrix of the form 

“ A, 0 

j 

0 Ayn 


where each Aj is an upper-triangular matrix of the form 


8.29 



Proof: For j = 1,..., m, let Uj denote the subspace of generalized 
eigenvectors of T corresponding to A j. Thus ( T - A jl)\uj is nilpotent 
(see 8.23(c)). For each j, choose a basis of Uj such that the matrix of 
(T - A jl) | Uj with respect to this basis is as in 8.26. Thus the matrix of 
T | Uj with respect to this basis will look like 8.29. Putting the bases for 
the Uj’s together gives a basis for V (by 8.23(a)). The matrix of T with 
respect to this basis has the desired form. ■ 
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Square Roots 

Recall that a square root of an operator T g £(V) is an operator 
5 G £(V) such that S 2 = T. As an application of the main structure 
theorem from the last section, in this section we will show that every 
invertible operator on a complex vector space has a square root. 

Every complex number has a square root, but not every operator on 
a complex vector space has a square root. An example of an operator 
on C 3 that has no square root is given in Exercise 4 in this chapter. 
The noninvertibility of that particular operator is no accident, as we 
will soon see. We begin by showing that the identity plus a nilpotent 
operator always has a square root. 


8.30 Lemma: Suppose N g L(V) is nilpotent. Then I + N has a 
square root. 


Proof: Consider the Taylor series for the function Vl + x: 


8.31 VI + x = 1 + aix + a 2 x 2 + ■ ■ ■ . 

We will not find an explicit formula for all the coefficients or worry 
about whether the infinite sum converges because we are using this 
equation only as motivation, not as a formal part of the proof. 

Because N is nilpotent, N m = 0 for some positive integer m. In 8.31, 
suppose we replace x with N and 1 with I. Then the infinite sum on 
the right side becomes a finite sum (because N< = 0 for all j > m). In 
other words, we guess that there is a square root of I + N of the form 


Becausea 3 = 1/2, this 
formula shows that 
1 + x / 2 is a good 
estimate for V1 + x 
when x is small. 


I + aiN + a 2 N 2 + ■ ■ ■ + o. m -iiV m-1 . 


Having made this guess, we can try to choose ai,a 2 , ■ ■ ■ , o. m _i so that 
the operator above has its square equal to I + N. Now 

(I+aiN + a 2 N 2 + a 3 N 3 + ■ ■ ■ + a m -iM m_1 ) 2 

= I + 2aiN + (2a 2 + ai 2 )N 2 + (2a3 + 2aia 2 )N 3 + ■ ■ ■ 

+ (2a m _i + terms involving a\,..., a m - 2 )N m ~ 1 . 

We want the right side of the equation above to equal I + N. Hence 
choose ai so that 2ai = 1 (thus ai = 1/2). Next, choose a 2 so that 
2a 2 + a 3 2 = 0 (thus a 2 = -1/8). Then choose a 3 so that the coefficient 
of N 3 on the right side of the equation above equals 0 (thus a 3 = 1/16). 
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On real vector spaces 
there exist invertible 
operators that have no 
square roots. For 
example, the operator 
of multiplication by -1 
on R has no square 
root because no real 
number has its square 
equal to -1. 


Continue in this fashion for j = 4,..., m - 1, at each step solving for 
(i j so that the coefficient of N J on the right side of the equation above 
equals 0. Actually we do not care about the explicit formula for the 
a/ s. We need only know that some choice of the a/s gives a square 
root ofl + N. m 

The previous lemma is valid on real and complex vector spaces. 
However, the next result holds only on complex vector spaces. 

8.B2 Theorem: Suppose V is a complex vector space. If T e LiV) 
is invertible, then T has a square root. 

Proof: Suppose T e L(V) is invertible. Let Ai,..., A m be the dis¬ 
tinct eigenvalues of T, and let Ui,...,U m be the corresponding sub¬ 
spaces of generalized eigenvectors. For each j, there exists a nilpotent 
operator Nj e L(Uj) such that T\ Vj = A jl + Nj (see 8.23(c)). Because T 
is invertible, none of the A/s equals 0, so we can write 

Nj 

T\uj = + 

for each j. Clearly Nj/Aj is nilpotent, and so I + Nj/Aj has a square 
root (by 8.30). Multiplying a square root of the complex number A j by 
a square root of I + Nj/Aj, we obtain a square root Sj of T\u j . 

A typical vector v e V can be written uniquely in the form 

V = U !+■■■+ U m , 

where each Uj e Uj (see 8.23). Using this decomposition, define an 
operator 5 e Li/V) by 


Sv — S\tt\ -p ■ ■ ■ “F SmU.ryi m 


You should verify that this operator 5 is a square root of T, completing 
the proof. ■ 

By imitating the techniques in this section, you should be able to 
prove that if V is a complex vector space and T e LiV) is invertible, 
then T has a k th -root for every positive integer k. 
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"the MirdmaC ToCynomiaC 


As we will soon see, given an operator on a finite-dimensional vec¬ 
tor space, there is a unique monic polynomial of smallest degree that 
when applied to the operator gives 0. This polynomial is called the 
minimal polynomial of the operator and is the focus of attention in 
this section. 

Suppose T G £(V), where dim V = n. Then 

(/, r, r 2 ,...,r n2 ) 

cannot be linearly independent in £(V) because £( V ) has dimension n 2 
(see 3.20) and we have n 2 + 1 operators. Let m be the smallest positive 
integer such that 


A monic polynomial is 

a polynomial whose 
highest degree 
coefficient equals 1. 
For example, 

2 + 3z 2 + z 8 is a monic 
polynomial. 


8.33 


(/, T, T 2 


is linearly dependent. The linear dependence lemma (2.4) implies that 
one of the operators in the list above is a linear combination of the 
previous ones. Because m was chosen to be the smallest positive in¬ 
teger such that 8.33 is linearly dependent, we conclude that T m is 
a linear combination of (I, T, T 2 ,..., T m_1 ). Thus there exist scalars 
no, ci i, a 2 , ..., cim- 1 £ F such that 

clqI + a\T + a 2 T 2 + ■ ■ ■ + + T m = 0. 

The choice of scalars o-o, a\, a 2 , ■ ■., a m - 1 e F above is unique because 
two different such choices would contradict our choice of m (subtract¬ 
ing two different equations of the form above, we would have a linearly 
dependent list shorter than 8.33). The polynomial 

ao + o-i z + a 2 z 2 + ■ ■ ■ + a m ~iz m ~ l + z m 

is called the minimal polynomial of T. It is the monic polynomial 
p G T( F) of smallest degree such that p(T) = 0. 

For example, the minimal polynomial of the identity operator I is 
z - 1. The minimal polynomial of the operator on F 2 whose matrix 
equals [ 05 ] is 20 - 9z + z 2 , as you should verify. 

Clearly the degree of the minimal polynomial of each operator on V 
is at most (dim V 7 ) 2 . The Cayley-Hamilton theorem (8.20) tells us that 
if V is a complex vector space, then the minimal polynomial of each 
operator on V has degree at most dim V. This remarkable improvement 
also holds on real vector spaces, as we will see in the next chapter. 
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Note that (z - A) 
divides a polynomial q 
if and only if A is a 
root of q. This follows 
immediately from 4.1. 


A polynomial p G T( F) is said to divide a polynomial q G T( F) if 
there exists a polynomial s e T( F) such that q = sp. In other words, 
p divides q if we can take the remainder r in 4.6 to be 0. For exam¬ 
ple, the polynomial (1 + 3z ) 2 divides 5 + 32z + 57z 2 + 18z 3 because 
5 + 32z + 57z 2 + 18z 3 = (2z + 5)(1 + 3z) 2 . Obviously every nonzero 
constant polynomial divides every polynomial. 

The next result completely characterizes the polynomials that when 
applied to an operator give the 0 operator. 


8.34 Theorem: Let T e L(V) and let q e T( F). Then q(T) = 0 if 
and only if the mini mal polynomial of T divides q. 

Proof: Let p denote the minimal polynomial of T. 

First we prove the easy direction. Suppose that p divides q. Thus 
there exists a polynomial s G T( F) such that q = sp. We have 

q(T) =s(T)p(T)=s(T) 0 = 0, 


as desired. 

To prove the other direction, suppose that q(T) = 0. By the division 
algorithm (4.5), there exist polynomials s,r G T( F) such that 

8.35 q = sp + r 
and deg r < deg p. We have 

0 = q(T) = s(T)p(T) + r(T) = r(T). 

Because p is the minimal polynomial of T and deg r < deg p, the equa¬ 
tion above implies that r = 0. Thus 8.35 becomes the equation q = sp, 
and hence p divides q, as desired. ■ 

Now we describe the eigenvalues of an operator in terms of its min¬ 
imal polynomial. 

8.36 Theorem: Let T e L(V). Then the roots of the minimal poly¬ 
nomial of T are precisely the eigenvalues of T. 
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Proof: Let 

piz) = ao + a\z + a 2 Z 2 + ■ ■ ■ + d m -iz m ~ l + z m 
be the mi nimal polynomial of T. 

First suppose that A e F is a root of p. Then the minimal polynomial 
of T can be written in the form 

Viz) = (z - A )q(z), 

where q is a monic polynomial with coefficients in F (see 4.1). Because 
p(T) = 0, we have 

0 = (T-AI)iq(T)v) 

for all v G V. Because the degree of q is less than the degree of the 
minimal polynomial p, there must exist at least one vector v G V such 
that q(T)v ± 0. The equation above thus implies that A is an eigenvalue 
of T, as desired. 

To prove the other direction, now suppose that A e F is an eigen¬ 
value of T. Let v be a nonzero vector in V such that Tv = Av. Repeated 
applications of T to both sides of this equation show that T->v = A'v 
for every nonnegative integer j. Thus 

0 = p(T)v = (ao + d\T + 0,2 T 2 + ■ ■ ■ + d m -\T m ~ l + T m )v 
= (do + aiA + a 2 A 2 + ■ ■ ■ + a m _iA ra_1 + A m )v 
= p( A)v. 

Because v £ 0, the equation above implies that pi A) = 0, as desired. ■ 

Suppose we are given, in concrete form, the matrix (with respect to 
some basis) of some operator T G £(V). To find the minimal polyno¬ 
mial of T, consider 

iM(I),M(T),M(T) 2 ,...,M(T) m ) 

for m = 1,2,... until this list is linearly dependent. Then find the 
scalars do, d\,d 2 ,---, d m -i G F such that 

doMil) + diM(T) + d 2 M(T) 2 + ■ ■ ■ + dm-iMiT)” 1 - 1 + M(T) m = 0. 

The scalars do, d\,d 2 ,---, d m ~ 1 ,1 will then be the coefficients of the 
minimal polynomial of T. All this can be computed using a familiar 
process such as Gaussian elimination. 


You can think of this as 
a system of (dim V) 2 
equations in m 
variables 

do, ai . a m - 1 . 
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For example, consider the operator T on C 5 whose matrix is given 
by 


8.37 


0000-3 
1 0 0 0 6 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 


Because of the large number of 0’s in this matrix, Gaussian eli mi nation 
is not needed here. Simply compute powers of M(T) and notice that 
there is no linear dependence until the fifth power. Do the computa¬ 
tions and you will see that the minimal polynomial of T equals 


8.38 


z 5 - 6z + 3. 


Now what about the eigenvalues of this particular operator? From 8.36, 
we see that the eigenvalues of T equal the solutions to the equation 

z 5 - 6z + 3 = 0. 


Unfortunately no solution to this equation can be computed using ra¬ 
tional numbers, arbitrary roots of rational numbers, and the usual rules 
of arithmetic (a proof of this would take us considerably beyond linear 
algebra). Thus we cannot find an exact expression for any eigenvalues 
of T in any familiar form, though numeric techniques can give good ap¬ 
proximations for the eigenvalues of T. The numeric techniques, which 
we will not discuss here, show that the eigenvalues for this particular 
operator are approximately 

-1.67, 0.51, 1.40, -0.12 + 1.591, -0.12 - 1.59i. 

Note that the nonreal eigenvalues occur as a pair, with each the complex 
conjugate of the other, as expected for the roots of a polynomial with 
real coefficients (see 4.10). 

Suppose V is a complex vector space and T e £(V). The Cayley- 
Hamilton theorem (8.20) and 8.34 imply that the minimal polynomial 
of T divides the characteristic polynomial of T. Both these polynomials 
are monic. Thus if the minimal polynomial of T has degree dim V, then 
it must equal the characteristic polynomial of T. For example, if T is 
the operator on C 5 whose matrix is given by 8.37, then the character¬ 
istic polynomial of T, as well as the minimal polynomial of T, is given 
by 8.38. 
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Jordan Jorm 

We know that if V is a complex vector space, then for every T e £(V) 
there is a basis of V with respect to which T has a nice upper-triangular 
matrix (see 8.28). In this section we will see that we can do even better— 
there is a basis of V with respect to which the matrix of T contains zeros 
everywhere except possibly on the diagonal and the line directly above 
the diagonal. 

We begin by describing the nilpotent operators. Consider, for ex¬ 
ample, the nilpotent operator N e £(F n ) defined by 

N(z\, ...,z n ) = (0,zi,...,z„_i). 

If v = (1, 0,..., 0), then clearly {v,Nv,.. . , jV n-1 v) is a basis of F' 1 and 
(N n ~ 1 v) is a basis of null N, which has dimension 1. 

As another example, consider the nilpotent operator N e £(F 5 ) de¬ 
fined by 

8.39 jV(zi,Z2,Z3,Z4,z 5 ) = (0,zi,z 2 ,0,z 4 ). 

Unlike the nilpotent operator discussed in the previous paragraph, for 
this nilpotent operator there does not exist a vector v e F 5 such that 
(v, Nv, N 2 v, N 3 v, N 4 v) is a basis of F 5 . However, if Vi = (1,0,0, 0, 0) 
and v 2 = (0,0, 0,1,0), then (vi, Nvi, iV 2 Vi, v 2 , NV 2 ) is a basis of F 5 
and (N 2 vi,Nv2) is a basis of null AT, which has dimension 2. 

Suppose N e £(V) is nilpotent. For each nonzero vector v G V, let 
m(v) denote the largest nonnegative integer such that Ai m(v) v 0. For 
example, if N e X(F 3 ) is dehned by 8.39, then m(l, 0,0,0, 0) = 2. 

The lemma below shows that every nilpotent operator N e £{V) 
behaves similarly to the example dehned by 8.39, in the sense that there 
is a finite collection of vectors Vi,..., Vfc e V such that the nonzero 
vectors of the form N J v r form a basis of V; here r varies from 1 to k 
and j varies from 0 to m (v r ). 

8.40 Lemma: If N e £(V) is nilpotent, then there exist vectors 
Vi, ..., Vfc e V such that 


Obviously m(v) 
depends on N as well 
as on v, but the choice 
of N will be clear from 
the context. 


(a) (vi.Nvi,..., jv m(Vl) v 1 ,..., Vk,NVk, ■ ■. ,N m(Vk) vO isabasis of V; 

(b) (JV w(Vl) Vi,... ,N m(Vk) v k ) is a basis of nulliV. 
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Proof: Suppose N is nilpotent. Then N is not injective and thus 
dim range N < dim V (see 3.21). By induction on the dimension of V, 
we can assume that the lemma holds on all vector spaces of smaller 
dimension. Using range AT in place of V and ATI range n in place of N, we 
thus have vectors e range AT such that 

(i) (ui,Nui,... ..., Uj,NUj,... ,N m(u i ) Uj ) is a basis of 

range Af; 

(ii) ... , N m(u J > Uj) is a basis of null AT n range AT. 

Because each u r © range AT, we can choose V\,...,Vj © V such that 
Nv r = u r for each r. Note that m(v r ) = m(u r ) + 1 for each r. 

The existence of a Let IT be a subspace of null N such that 
subspace W with this 

property follows from 8.41 null AT = (null N n range N) © W 

2.13. 

and choose a basis of W, which we will label (vj+ 1 ,..., v^). Because 
Vj+ 1 ,..., Vfc © nullAf, we have m(Vj+ 1 ) = ■ ■ ■ = m(Vfe) = 0. 

Having constructed Vi,...,Vfc, we now need to show that (a) and 
(b) hold. We begin by showing that the alleged basis in (a) is linearly 
independent. To do this, suppose 

k m(v r ) 

8.42 0 = S S a-r, s N s (Vr), 

r= 1 5=0 

where each a r<s © F. Applying N to both sides of the equation above, 
we get 

k m(v r ) 

0= X X Ctr,sN s+1 (v r ) 

r=1 5=0 
j m(Ur) 

= ^ ^ Clr^N (Ur )■ 

r=1 5=0 

The last equation, along with (i), implies that a r , s = 0 for 1 < r < j, 
0 < 5 < m(v r ) - 1. Thus 8.42 reduces to the equation 

0 =a 1 ,m(v l) Ai w(Vl) Vi + ■ ■ ■ + a jt m(v j )N' m(Vl) y j 
+ Q-j+lflVj+l + ' ' ' + Ctk,0 v k- 
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The terms on the first line on the right are all in null N n range JV; the 
terms on the second line are all in W. Thus the last equation and 8.41 
imply that 

0 = ai, m(Vl )lV ,n(Vl) vi + ■ ■ ■ + aj, m( v, ) jV m(Vj) v / 

8.43 = ai im{Vl )N m(Ul) ui + ■ ■ ■ + a jtm{Vj) N m{u j ) Uj 

and 

8.44 0 = aj+ifiVj+i + ■ ■ ■ + a ki0 v k . 

Now 8.43 and (ii) imply that a i, m(Vl ) = ■ ■ ■ = aj, m ( V p = 0. Because 
(Vj+ 1 ,..., Vfc) is a basis of W, 8.44 implies that a/+i,o = ■ ■ ■ = a k< o = 0. 
Thus all the a’s equal 0, and hence the list of vectors in (a) is linearly 
independent. 

Clearly (ii) implies that dim (null N n range N) = j. Along with 8.41, 
this implies that 

8.45 dim null JV = k. 

Clearly (i) implies that 

j 

dimrangeiV = ^ (m(u r ) + 1) 

r =0 
j 

8.46 = ^ m(v r ). 

r= 0 

The list of vectors in (a) has length 

k j 

^ (m(v r ) + 1) = k + ^ m(v r ) 

r= 0 r= 0 

= dim null N + dim range /V 
= dim V, 

where the second equality comes from 8.45 and 8.46, and the third 
equality comes from 3.4. The last equation shows that the list of vectors 
in (a) has length dim V; because this list is linearly independent, it is a 
basis of V (see 2.17), completing the proof of (a). 

Finally, note that 

(jym(vi) Vlj _ _ . t N miVk) V k ) = (N miUl) U 1 ,.. .,N m(Uj) Uj, V J+ i ,..., V k ). 
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Now (ii) and 8.41 show that the last list above is a basis of null AT, com¬ 
pleting the proof of (b). ■ 

Suppose T G £(V). A basis of V is called a Jordan basis for T if 
with respect to this basis T has a block diagonal matrix 

Ai 0 

0 A m 

where each A, is an upper-triangular matrix of the form 

Aj 1 0 

4 / 

1 

0 Aj 

To understand why In each Aj, the diagonal is filled with some eigenvalue A j of T, the line 
each Aj must be an directly above the diagonal is filled with l’s, and all other entries are 0 

eigenvalue of T, (Aj may be just a 1-by-l block consisting of just some eigenvalue). 

see 5.18. Because there exist operators on real vector spaces that have no 
eigenvalues, there exist operators on real vector spaces for which there 
is no corresponding Jordan basis. Thus the hypothesis that V is a com¬ 
plex vector space is required for the next result, even though the pre¬ 
vious lemma holds on both real and complex vector spaces. 

The French 8.47 Theorem: Suppose V is a complex vector space. If T G £(V), 
mathematician Camille then there is a basis of V that is a Jordan basis for T. 

Jordan first published a 

proof of this theorem Proof: First consider a nilpotent operator Ai G £{V) and the vec- 
in 1870. tors Vi,..., Vfe G V given by 8.40. For each j, note that Ai sends the first 
vector in the list (N m<v f , Vj,... ,NVj, vj) to 0 and that Ai sends each vec¬ 
tor in this list other than the first vector to the previous vector. In other 
words, if we reverse the order of the basis given by 8.40(a), then we ob¬ 
tain a basis of V with respect to which Ai has a block diagonal matrix, 
where each matrix on the diagonal has the form 

0 1 0 


1 

0 0 
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Thus the theorem holds for nilpotent operators. 

Now suppose T e £(V). Let Ai,...,A to be the distinct eigenval¬ 
ues of T, with Ui,, U m the corresponding subspaces of generalized 
eigenvectors. We have 

V = Ui e ■ ■ ■ e U m , 

where each (T - A jl)\uj is nilpotent (see 8.23). By the previous para¬ 
graph, there is a basis of each Uj that is a Jordan basis for (T - X 1 I)\ij j . 
Putting these bases together gives a basis of V that is a Jordan basis 
for T. m 
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"Exercises 

1. Define T G £(C 2 ) by 

T(w,z) = (z, 0). 

Find all generalized eigenvectors of T. 

2. Define T G £(C 2 ) by 

T(w,z) = (~z,w). 

Find all generalized eigenvectors of T. 

3. Suppose T G £(V), m is a positive integer, and v G V is such 
that T m ~ l v 4 1 0 but T m v = 0. Prove that 

(v,Tv,T 2 v,...,T m ^v) 

is linearly independent. 

4. Suppose T € X(C 3 ) is definedby T(zi, Z2, Z3) = (zt, Z3, 0). Prove 
that T has no square root. More precisely, prove that there does 
not exist S e £(C 3 ) such that S 2 = T. 

5. Suppose S,T G £(V). Prove that if ST is nilpotent, then TS is 
nilpotent. 

6. Suppose N G £(V) is nilpotent. Prove (without using 8.26) that 
0 is the only eigenvalue of N. 

7. Suppose V is an inner-product space. Prove that if N G £(V) is 
self-adjoint and nilpotent, then AT = 0. 

8. Suppose JV G £(V) is such that null AT dimV ” 1 V nulliV dim ' / . Prove 
that N is nilpotent and that 

dim null N J = j 

for every integer j with 0 < j < dim V. 

9. Suppose T G £(V) and m is a nonnegative integer such that 

range T m = range T m+1 , 

Prove that range T k = range T m for all k > m. 
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10. Prove or give a counterexample: if T e £(V), then 

V = null T ® range T. 

11. Prove that if T G £(V), then 

V = null T n ® range T n , 

where n = dim V. 

12. Suppose V is a complex vector space, N G £(V), and 0 is the only 
eigenvalue of N. Prove that N is nilpotent. Give an example to 
show that this is not necessarily true on a real vector space. 

13. Suppose that V is a complex vector space with dim V = n and 
T G £{V) is such that 

nullT"- 2 £ null T n ~ l . 

Prove that T has at most two distinct eigenvalues. 

14. Give an example of an operator on C 4 whose characteristic poly¬ 
nomial equals (z - 7) 2 (z - 8) 2 . 

15. Suppose V is a complex vector space. Suppose T G £(V) is such 
that 5 and 6 are eigenvalues of T and that T has no other eigen¬ 
values. Prove that 

(T- 5/) >I " 1 (r-6/) M " 1 = 0, 

where n = dim V. 

16. Suppose V is a complex vector space and T G £(V). Prove that 
V has a basis consisting of eigenvectors of T if and only if every 
generalized eigenvector of T is an eigenvector of T. 

17. Suppose V is an inner-product space and N G £(V) is nilpotent. 
Prove that there exists an orthonormal basis of V with respect to 
which N has an upper-triangular matrix. 

18. Define N G £(F 5 ) by 

N(Xi,X 2 ,X 3 ,X4,Xs) = (2x 2 , 3x 3 , -x 4 , 4x 5 ,0). 


For complex vector 
spaces, this exercise 
adds another 
equivalence to the list 
given by 5.21. 


Find a square root of / + N. 
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19. Prove that if V is a complex vector space, then every invertible 
operator on V has a cube root. 

20. Suppose T G £(V) is invertible. Prove that there exists a polyno¬ 
mial p G T( F) such that T~ x = p(T). 

21. Give an example of an operator on C 3 whose minimal polynomial 
equals z 2 . 

22. Give an example of an operator on C 4 whose minimal polynomial 
equals z(z - l) 2 . 


For complex vector 
spaces, this exercise 
adds another 
equivalence to the list 
given by 5.21. 


23. Suppose V is a complex vector space and T e £(V). Prove that 
V has a basis consisting of eigenvectors of T if and only if the 
minimal polynomial of T has no repeated roots. 

24. Suppose V is an inner-product space. Prove that if T G £(V) is 
normal, then the minimal polynomial of T has no repeated roots. 


25. Suppose T G £(V) and v e V. Let p be the monic polynomial of 
smallest degree such that 


p(T)v = 0. 


Prove that p divides the minimal polynomial of T. 

26. Give an example of an operator on C 4 whose characteristic and 
minimal polynomials both equal z(z-l) 2 (z-3). 

27. Give an example of an operator on C 4 whose characteristic poly¬ 
nomial equals z(z-l) 2 (z-3) and whose minimal polynomial 
equals z(z — 1)(z — 3). 


This exercise shows 
that every monic 
polynomial is the 
characteristic 
polynomial of some 
operator. 


28. Suppose a o,..., a n -\ G C. Find the minimal and characteristic 

polynomials of the operator on C n whose matrix (with respect to 
the standard basis) is 

0 -a o 

1 0 -tii 

1 -a 2 


0 

1 


tTn~ 2 
— &n-l 
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29. Suppose N e £(V) is nilpotent. Prove that the minimal poly¬ 
nomial of N is z m+1 , where m is the length of the longest con¬ 
secutive string of l’s that appears on the line directly above the 
diagonal in the matrix of N with respect to any Jordan basis for JV. 

30. Suppose V is a complex vector space and T e £(V). Prove that 
there does not exist a direct sum decomposition of V into two 
proper subspaces invariant under T if and only if the minimal 
polynomial of T is of the form (z - A) dim v for some A e C. 

31. Suppose T e £(V) and (vi,..., v n ) is a basis of V that is a Jordan 
basis for T. Describe the matrix of T with respect to the basis 
(v n ,..., Vi) obtained by reversing the order of the v’s. 



Chapter 9 


Operators on 
'ReaCyector Spaces 


In this chapter we delve deeper into the structure of operators on 
real vector spaces. The important results here are somewhat more com¬ 
plex than the analogous results from the last chapter on complex vector 
spaces. 

Recall that F denotes R or C. 

Also, V is a finite-dimensional, nonzero vector space over F. 

Some of the new results in this chapter are valid on complex vector 
spaces, so we have not assumed that V is a real vector space. 


❖ ❖ ❖ ❖ 

#T% #T% #T% 
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TigenvaCues of Square Matrices 

We have defined eigenvalues of operators; now we need to extend 
that notion to square matrices. Suppose A is an n-by-n matrix with 
entries in F. A number A e F is called an eigenvalue of A if there 
exists a nonzero n-by-1 matrix x such that 

Ax = Ax. 


For example, 3 is an eigenvalue of [if] because 


— 1 

00 

_1 

2 


1 

QO 

1 _ 

_ z> 

1 - 

CM 

1 _ 

L 1 5 J 

-1 


l 

cn 

1 

_i 

= D 

- 1 

i-1 

1 

_ 1 


As another example, you should verify that the matrix [ i "o 1 ] has no 
eigenvalues if we are thinking of F as the real numbers (by definition, 
an eigenvalue must be in F) and has eigenvalues i and -i if we are 
thinking of F as the complex numbers. 

We now have two notions of eigenvalue—one for operators and one 
for square matrices. As you might expect, these two notions are closely 
connected, as we now show. 

9.1 Proposition: Suppose T e £(V) and A is the matrix of T with 
respect to some basis of V. Then the eigenvalues of T are the same as 
the eigenvalues of A. 

Proof: Let (vi,..., v n ) be the basis of V with respect to which T 
has matrix A. Let A e F. We need to show that A is an eigenvalue of T 
if and only if A is an eigenvalue of A. 

First suppose A is an eigenvalue of T. Let v e V be a nonzero vector 
such that Tv = Av. We can write 

9.2 v = a\V\ + ■ ■ ■ + a n v n , 

where ai,...,a n e F. Let x be the matrix of the vector v with respect 
to the basis (vi,..., v n ). Recall from Chapter 3 that this means 


d\ 

CL n 


9.3 


x = 
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We have 

Ax = M(T)M(y) = M(Tv) = M(Av) = A M(v) = Ax, 

where the second equality comes from 3.14. The equation above shows 
that A is an eigenvalue of A, as desired. 

To prove the implication in the other direction, now suppose A is an 
eigenvalue of A. Let x be a nonzero n-by-1 matrix such that Ax = Ax. 
We can write x in the form 9.3 for some scalars ai,... ,a n e F. Define 
v e V by 9.2. Then 

M(Tv) = M{T)M{v) =Ax = Ax = M( Av). 

where the first equality comes from 3.14. The equation above implies 
that Tv = Av, and thus A is an eigenvalue of T, completing the proof. ■ 

Because every square matrix is the matrix of some operator, the 
proposition above allows us to translate results about eigenvalues of 
operators into the language of eigenvalues of square matrices. For 
example, every square matrix of complex numbers has an eigenvalue 
(from 5.10). As another example, every n-by-n matrix has at most n 
distinct eigenvalues (from 5.9). 

'Block 'Ujiper-Triangular Matrices 

Earlier we proved that each operator on a complex vector space has 
an upper-triangular matrix with respect to some basis (see 5.13). In 
this section we will see that we can almost do as well on real vector 
spaces. 

In the last two chapters we used block diagonal matrices, which 
extend the notion of diagonal matrices. Now we will need to use the 
corresponding extension of upper-triangular matrices. A block upper- 
triangular matrix is a square matrix of the form 

r Ai * i 


0 A m 


As usual, we use an 
asterisk to denote 
entries of the matrix 
that play no important 
role in the topics under 
consideration. 


where A\,... ,A m are square matrices lying along the diagonal, all en¬ 
tries below A\,...,A m equal 0, and the * denotes arbitrary entries. For 
example, the matrix 
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Every upper-triangular 
matrix is also a block 
upper-triangular matrix 
with blocks of size 
1 -by -1 along the 
diagonal. At the other 
extreme, every square 
matrix is a block 
upper-triangular matrix 
because we can take 
the first (and only) 
block to be the entire 
matrix. Smaller blocks 
are better in the sense 
that the matrix then 
has more 0’s. 



" 4 

10 

11 

12 

13 


0 

-3 

-3 

14 

25 

A = 

0 

-3 

-3 

16 

17 


0 

0 

0 

5 

5 


0 

0 

0 

5 

5 


is a block upper-triangular matrix with 


A = 


A\ * 

A2 

0 A 3 


where 




~ -3 -3 " 


"5 5 

A, = [ 4 ] 

■ M = 

-3 -3 

. M = 

5 5 


Now we prove that for each operator on a real vector space, we can 
find a basis that gives a block upper-triangular matrix with blocks of 
size at most 2-by-2 on the diagonal. 


9.4 Theorem: Suppose V is a real vector space and T e £(V). 
Then there is a basis of V with respect to which T has a block upper- 
triangular ma trix 


where each Aj is a 1 -by- 1 matrix or a 2-by-2 matrix with no eigenvalues. 

Proof: Clearly the desired result holds if dim V = 1 . 

Next, consider the case where dim V = 2. If T has an eigenvalue A, 
then let V\ e V be any nonzero eigenvector. Extend (Vi) to a basis 
(vi, V 2 ) of V. With respect to this basis, T has an upper-triangular 
matrix of the form 

A a 
0 b 

In particular, if T has an eigenvalue, then there is a basis of V with 
respect to which T has an upper-triangular matrix. If T has no eigen¬ 
values, then choose any basis (vi, V 2 ) of V. With respect to this basis, 
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the matrix of T has no eigenvalues (by 9.1). Thus regardless of whether 
T has eigenvalues, we have the desired conclusion when dim V = 2. 

Suppose now that dim V > 2 and the desired result holds for all real 
vector spaces with smaller dimension. If T has an eigenvalue, let U be a 
one-dimensional subspace of V that is invariant under T ; otherwise let 
U be a two-dimensional subspace of V that is invariant under T (5.24 
guarantees that we can choose U in this fashion). Choose any basis 
of U and let A\ denote the matrix of T\u with respect to this basis. If 
Ai is a 2-by-2 matrix, then T has no eigenvalues (otherwise we would 
have chosen U to be one-dimensional) and thus T\jj has no eigenvalues. 
Hence if A\ is a 2-by-2 matrix, then Ai has no eigenvalues (see 9.1). 

Let W be any subspace of V such that 

V =U®W\ 


2.13 guarantees that such a W exists. Because W has dimension less 
than the dimension of V, we would like to apply our induction hypoth¬ 
esis to T\w- However, W might not be invariant under T, meaning that 
T\w might not be an operator on W. We will compose with the pro¬ 
jection Pw,u to get an operator on W. Specifically, define S G L(W) 
by 

Sw = Pw,u(Tw ) 

for w G W. Note that 


Recall that if 
v = w + u, where 
w G W and u e U, 
then Pw,uv = w. 


Tw = Py,w(Tw) + Pw,u(Tw) 

9.6 = Pu,w(Tw) + Sw 

for every w gW. 

By our induction hypothesis, there is a basis of W with respect to 
which S has a block upper-triangular matrix of the form 

A2 * 

j 

0 A m 

where each Aj is a 1-by-l matrix or a 2-by-2 matrix with no eigenvalues. 
Adjoin this basis of W to the basis of U chosen above, getting a basis 
of V. A minute’s thought should convince you (use 9.6) that the matrix 
of T with respect to this basis is a block upper-triangular matrix of the 
form 9.5, completing the proof. ■ 
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"The Characteristic TotynomiaC 

For operators on complex vector spaces, we defined characteristic 
polynomials and developed their properties by making use of upper- 
triangular matrices. In this section we will carry out a similar procedure 
for operators on real vector spaces. Instead of upper-triangular matri¬ 
ces, we will have to use the block upper-triangular matrices furnished 
by the last theorem. 

In the last chapter, we did not define the characteristic polynomial 
of a square matrix with complex entries because our emphasis is on 
operators rather than on matrices. However, to understand operators 
on real vector spaces, we will need to define characteristic polynomials 
of 1-by-l and 2-by-2 matrices with real entries. Then, using block-upper 
triangular matrices with blocks of size at most 2-by-2 on the diagonal, 
we will be able to define the characteristic polynomial of an operator 
on a real vector space. 

To motivate the definition of characteristic polynomials of square 
matrices, we would like the following to be true (think about the Cayley- 
Hamilton theorem; see 8.20): if T e £(V) has matrix A with respect 
to some basis of V and q is the characteristic polynomial of A, then 
q(T) = 0. 

Let’s begin with the trivial case of 1-by-l matrices. Suppose V is a 
real vector space with dimension 1 and T e £(V). If [A] equals the 
matrix of T with respect to some basis of V, then T equals A I. Thus 
if we let q be the degree 1 polynomial defined by q(x) = x - A, then 
q(T) = 0. Hence we define the characteristic polynomial of [A] to be 
x — A. 

Now let’s look at 2-by-2 matrices with real entries. Suppose V is a 
real vector space with dimension 2 and T e £(V). Suppose 

a c 
b d 

is the matrix of T with respect to some basis (v\,V 2 ) of V. We seek 
a rnonic polynomial q of degree 2 such that q(T) = 0. If b = 0, then 
the matrix above is upper triangular. If in addition we were dealing 
with a complex vector space, then we would know that T has charac¬ 
teristic polynomial (z - a) (z - d ). Thus a reasonable candidate might 
be (x - a)(x - d), where we use x instead of z to emphasize that 
now we are working on a real vector space. Let’s see if the polynomial 
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(x - a)(x - d), when applied to T, gives 0 even when b f 0. We have 
(T - aI)(T - dl)vi = (T - dI)(T - al)v i = (T - dl){bv 2 ) = bcv i 
and 

(T - aI)(T - dl)v 2 = (T - al){cv i) = bcv 2 . 

Thus (T - aI)(T - dl) is not equal to 0 unless be = 0. However, the 
equations above show that ( T - aI)(T - dl ) - bcl = 0 (because this 
operator equals 0 on a basis, it must equal 0 on V). Thus if q(x) = 
(x - a)(x - d) - be, then q(T) = 0. 

Motivated by the previous paragraph, we define the characteristic 
polynomial of a 2-by-2 matrix [£ c d \ to be (x - a)(x - d) - be. Here 
we are concerned only with matrices with real entries. The next re¬ 
sult shows that we have found the only reasonable definition for the 
characteristic polynomial of a 2-by-2 matrix. 


9.7 Proposition: Suppose V is a real vector space with dimension 2 
andT e L(V) has no eigenvalues. Let p e ?(R) be a monicpolynomial 
with degree 2. Suppose A is the matrix of T with respect to some basis 
ofV. 

(a) If p equals the characteristic polynomial of A, then p(T) =0. 

(b) If p does not equal the characteristic polynomial of A, then p(T) 
is invertible. 

Proof: We already proved (a) in our discussion above. To prove (b), 
let q denote the characteristic polynomial of A and suppose that p f q. 
We can write p(x) = x 2 + &\x + and q(x) = x 2 + a 2 x + p 2 for some 
«i, j8i, a 2 , p 2 e R. Now 

p(T) = p(T) - q(T) = («i - <x 2 )T + (0i - 0 2 )/. 

If «i = a 2 , then 0| f 02 (otherwise we would have p = q). Thus if 
cxi = a 2 , then p(T) is a nonzero multiple of the identity and hence is 
invertible, as desired. If 04 f <x 2 , then 


Part (b) of this 
proposition would be 
false without the 
hypothesis that T has 
no eigenvalues. For 
example, define 
T G £(R 2 ) by 
T(x i,x 2 ) = (0,x 2 ). 
Take p(x) = x(x - 2). 
Then p is not the 
characteristic 
polynomial of the 
matrix of T with 
respect to the standard 
basis, but p(T) is not 
invertible. 


p(T) = (oq - a 2 ){T - — — —I), 

04 - «2 

which is an invertible operator because T has no eigenvalues. Thus (b) 
holds. ■ 
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Suppose V is a real vector space with dimension 2 and T G £(V) has 
no eigenvalues. The last proposition shows that there is precisely one 
monic polynomial with degree 2 that when applied to T gives 0. Thus, 
though T may have different matrices with respect to different bases, 
each of these matrices must have the same characteristic polynomial. 
For example, consider T G £(R 2 ) defined by 

9.8 T(x i,X 2 ) = (3xi + 5x2,-2xi - X 2 ). 

The matrix of T with respect to the standard basis of R 2 is 

"3 5 

-2 -1 ' 

The characteristic polynomial of this matrix is(x-3)(x+l) + 2- 5, 
which equals x 2 - 2x + 7. As you should verify, the matrix of T with 
respect to the basis ((-2,1), (1, 2)) equals 

" 1 -6 ' 

1 1 

The characteristic polynomial of this matrix is (x - l)(x - 1) + 1 ■ 6, 
which equals x 2 - 2x + 7, the same result we obtained by using the 
standard basis. 

When analyzing upper-triangular matrices of an operator T on a 
complex vector space V, we found that subspaces of the form 

null(T- AI) dimV 

played a key role (see 8.10). Those spaces will also play a role in study¬ 
ing operators on real vector spaces, but because we must now consider 
block upper-triangular matrices with 2-by-2 blocks, subspaces of the 
form 

null(T 2 + txT + pi) dimV 

will also play a key role. To get started, let’s look at one- and two- 
dimensional real vector spaces. 

First suppose that V is a one-dimensional real vector space and that 
T G £(V). If A e R, then null(T - A I) equals V if A is an eigenvalue 
of T and {0} otherwise. If a, G R with a 2 < 4yS, then 
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nulHT 2 + aT + pi) = {0}. 


(Proof: Because V is one-dimensional, there is a constant AeR such 
that Tv = Av for all v e V. Thus (T 2 + aT + fil)v = (A 2 + cxA + fi)v. 
However, the inequality a 2 < 4p implies that A 2 + a A + /i 4 0, and thus 
null(T 2 + aT + pi) = {0}.) 

Now suppose V is a two-dimensional real vector space and T G L(V) 
has no eigenvalues. If A e R, then nuIKT' - A I) equals {0} (because T 
has no eigenvalues). If a, G R with a 2 < 4fi, then null(T 2 + aT + fil) 
equals V if x 2 + ax + fi is the characteristic polynomial of the matrix 
of T with respect to some (or equivalently, every) basis of V and equals 
{0} otherwise (by 9.7). Note that for this operator, there is no middle 
ground—the null space of T 2 + aT + fil is either {0} or the whole space; 
it cannot be one-dimensional. 

Now suppose that V is a real vector space of any dimension and 
T G £(V). We know that V has a basis with respect to which T has 
a block upper-triangular matrix with blocks on the diagonal of size at 
most 2-by-2 (see 9.4). In general, this matrix is not unique—V may 
have many different bases with respect to which T has a block upper- 
triangular matrix of this form, and with respect to these different bases 
we may get different block upper-triangular matrices. 

We encountered a similar situation when dealing with complex vec¬ 
tor spaces and upper-triangular matrices. In that case, though we might 
get different upper-triangular matrices with respect to the different 
bases, the entries on the diagonal were always the same (though possi¬ 
bly in a different order). Might a similar property hold for real vector 
spaces and block upper-triangular matrices? Specifically, is the num¬ 
ber of times a given 2-by-2 matrix appears on the diagonal of a block 
upper-triangular matrix of T independent of which basis is chosen? 
Unfortunately this question has a negative answer. For example, the 
operator T e £(R 2 ) defined by 9.8 has two different 2-by-2 matrices, 
as we saw above. 

Though the number of times a particular 2-by-2 matrix might appear 
on the diagonal of a block upper-triangular matrix of T can depend on 
the choice of basis, if we look at characteristic polynomials instead 
of the actual matrices, we find that the number of times a particular 
characteristic polynomial appears is independent of the choice of basis. 
This is the content of the following theorem, which will be our key tool 
in analyzing the structure of an operator on a real vector space. 


Recall that a 2 < 4/S 
implies that 
x 2 + ax + fl has no real 
roots; see 4.11. 
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9.9 Theorem: Suppose V is a real vector space and T e £(V). 
Suppose that with respect to some basis of V, the matrix of T is 


9.10 


A\ * 

0 A m 


This result implies that 
null(T 2 + <xT + £/) dlmV 
must have even 
dimension. 


where each Aj is a 1 -by-1 matrix or a 2-by-2 matrix with no eigenvalues. 

(a) If A e R, then precisely dim nuIKT - AJ) dimV of the matrices 
A\,... ,A m equal the 1-by-l matrix [A], 

(b) If a, p e R satisfy a 2 < 4/1, then precisely 

dim null (T 2 + aT + /i/ ) dimV ' 

2 

of the matrices A\, ..., A m have characteristic polynomial equal 
to x 2 + ax + p. 


This proof uses the 
same ideas as the proof 
of the analogous result 
on complex vector 
spaces (8.10). As usual, 
the real case is slightly 
more complicated but 
requires no new 
creativity. 


Proof: We will construct one proof that can be used to prove both 
(a) and (b). To do this, let A ,a,P e R with a 2 < 4fi. Define p e TTR) 
by 

, f x - A if we are trying to prove (a); 

h l x 2 + ax + P if we are trying to prove (b). 

Let d denote the degree of p. Thus d = 1 if we are trying to prove (a) 
and d = 2 if we are trying to prove (b). 

We will prove this theorem by induction on m, the number of blocks 
along the diagonal of 9.10. If m = 1, then dim V = 1 or dim V = 2; the 
discussion preceding this theorem then implies that the desired result 
holds. Thus we can assume that m > 1 and that the desired result 
holds when m is replaced with m - 1. 

For convenience let n = dim V. Consider a basis of V with respect 
to which T has the block upper-triangular matrix 9.10. Let Uj denote 
the span of the basis vectors corresponding to Aj. Thus dim Uj = 1 
if Aj is a 1-by-l matrix and dim Uj = 2 if Aj is a 2-by-2 matrix. Let 
U = U i + - - - + U m -i. Clearly U is invariant under T and the matrix 
of T\u with respect to the obvious basis (obtained from the basis vec¬ 
tors corresponding to A\, ..., A m _i) is 


9.11 


A\ * 

Am -1 


0 
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Thus, by our induction hypothesis, 

^ precisely (1/d) dimnullpCTIy)' 1 of the matrices 
Ai,..., A m _i have characteristic polynomial p. 

Actually the induction hypothesis gives 9.12 with exponent dim U in¬ 
stead of n, but then we can replace dimf/ with n (by 8.6) to get the 
statement above. 

Suppose u m e U m . Let S e L(U m ) be the operator whose matrix 
(with respect to the basis corresponding to U m ) equals A m . In particu¬ 
lar, Su m = P Um ,u Tu m■ Now 

Tu m = Pu,U m T u m + PUm.uTUm 

= ^y -I- Stinii 

where denotes a vector in U. Note that Su m € U m ; thus applying 
T to both sides of the equation above gives 

T^U m = "T S"U m , 

where again * y denotes a vector in U, though perhaps a different vector 
than the previous usage of * y (the notation *y is used when we want 
to emphasize that we have a vector in U but we do not care which 
particular vector—each time the notation * y is used, it may denote a 
different vector in U). The last two equations show that 

9.1 3 p(T)u m = *y + p(S)u m 

for some *y e U. Note that p(S)u m e U m \ thus iterating the last 
equation gives 

9.14 p(T) n u m = *y + p(S) n u m 
for some *y G U. 

The proof now breaks into two cases. First consider the case where 
the characteristic polynomial of A m does not equal p. We will show 
that in this case 

9.15 nullp(T) n c U. 

Once this has been verified, we will know that 


null p(T) n = nullp(T|[/) n , 
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and hence 9.12 will tell us that precisely (1/d) dim null p(T) n of the 
matrices Ai,... ,A m have characteristic polynomial p, completing the 
proof in the case where the characteristic polynomial of A m does not 
equal p. 

To prove 9.15 (still assuming that the characteristic polynomial of 
A m does not equal p), suppose v e nullp(T) n . We can write v in the 
form v = u + u m , where u G U and u m e U m . Using 9.14, we have 

0 = p(T) n v = p(T) n u + p(T) n u m = p(T) n u + *u + p(S) n u m 

for some % e (/, Because the vectors p(T) n u and are in U and 
p(S) n u m e U m , this implies that p(S) n u m = 0. However, p(S) is in¬ 
vertible (see the discussion preceding this theorem about one- and two- 
dimensional subspaces and note that dim U m < 2), so u m = 0. Thus 
v = u G U, completing the proof of 9.15. 

Now consider the case where the characteristic polynomial of A m 
equals p. Note that this implies dimf/ m = d. We will show that 

9.16 dimnull p(T) n = dimimllp (T\u) n + d, 

which along with 9.12 will complete the proof. 

Using the formula for the dimension of the sum of two subspaces 
(2.18), we have 

dim null p(T) n = dim(U n null p(T) n ) + d i m ( LJ + null p(T) n ) - dim U 
= dim null p(T\u) n + dim(£7 + nullp(D M ) - (n - d). 

If U + null p(T) n = V, then dim( U + null p(T) n ) = n, which when com¬ 
bined with the last formula above for dim null p(T) n would give 9.16, 
as desired. Thus we will finish by showing that U + null p( T) n = V. 

To prove that U + nullp(r)' 1 = V, suppose u m e U m . Because the 
characteristic polynomial of the matrix of S (namely, A m ) equals p, we 
have p(S) = 0. Thus p(T)u m e U (from 9.13). Now 

p(T) n u m = p(T) n ~ l (p(T)u m ) e rangep(T|[/)" _1 = rangep(T|[/) n , 

where the last equality comes from 8.9. Thus we can choose u e U 
such that p(T) n u m = p(T\u) n u. Now 

p(T) n (u m -u) = p(T) n u m - p(T) n u 
= p(T) n u m - p(T\u) n u 
= 0. 
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Thus u m — u G null p{T) n , and hence u m , which equals u + ( u m - u), 
is in U + null p(T) n . In other words, U m c U + null p(T) n . Therefore 
V = U+Um c U+millp(T) n , andhence U+rmllp(T) n = V, completing 
the proof. ■ 

As we saw in the last chapter, the eigenvalues of an operator on a 
complex vector space provide the key to analyzing the structure of the 
operator. On a real vector space, an operator may have fewer eigen¬ 
values, counting multiplicity, than the dimension of the vector space. 
The previous theorem suggests a definition that makes up for this defi¬ 
ciency. We will see that the definition given in the next paragraph helps 
make operator theory on real vector spaces resemble operator theory 
on complex vector spaces. 

Suppose V is a real vector space and T G £(V). An ordered pair 
( a, p ) of real numbers is called an eigenpair of T if a 2 < 4/1 and 

T 2 + <xT + pi 

is not injective. The previous theorem shows that T can have only 
finitely many eigenpairs because each eigenpair corresponds to the 
characteristic polynomial of a 2-by-2 matrix on the diagonal of 9.10 
and there is room for only finitely many such matrices along that diag¬ 
onal. Guided by 9.9, we define the multiplicity of an eigenpair (a, ft) 
of T to be 

dim null (T 2 + aT + pI) dimV 
2 

From 9.9, we see that the multiplicity of (tx, /)) equals the number of 
times that x 2 + ax + fi is the characteristic polynomial of a 2-by-2 matrix 
on the diagonal of 9.10. 

As an example, consider the operator T G £(R 3 ) whose matrix (with 
respect to the standard basis) equals 

"3 -1 -2 ' 

3 2-3. 

12 0 

You should verify that (-4,13) is an eigenpair of T with multiplicity 1; 
note that T 2 - AT + 13/ is not injective because (-1,0,1) and (1,1, 0) 
are in its null space. Without doing any calculations, you should verify 
that T has no other eigenpairs (use 9.9). You should also verify that 1 is 
an eigenvalue of T with multiplicity 1, with corresponding eigenvector 
(1,0,1), and that T has no other eigenvalues. 


Though the word 
eigenpair was chosen 
to be consistent with 
the word eigenvalue, 
this terminology is not 
in widespread use. 
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This proposition shows 
that though an 
operator on a real 
vector space may have 
no eigenvalues, or it 
may have no 
eigenpairs, it cannot be 
lacking in both these 
useful objects. It also 
shows that an operator 
on a real vector space 
V can have at most 
(dimV)/2 distinct 
eigenpairs. 


Note that the roots of 
the characteristic 
polynomial of T equal 
the eigenvalues of T, as 
was true on complex 
vector spaces. 


In the example above, the sum of the multiplicities of the eigenval¬ 
ues of T plus twice the multiplicities of the eigenpairs of T equals 3, 
which is the dimension of the domain of T. The next proposition shows 
that this always happens on a real vector space. 

9.1 7 Proposition: If V is a real vector space and T e £(V), then 
the sum of the multiplicities of all the eigenvalues of T plus the sum 
of twice the multiplicities of all the eigenpairs of T equals dim V. 

Proof: Suppose V is a real vector space and T e £(V). Then there 
is a basis of V with respect to which the matrix of T is as in 9.9. The 
multiplicity of an eigenvalue A equals the number of times the 1-by-l 
matrix [A] appears on the diagonal of this matrix (from 9.9). The multi¬ 
plicity of an eigenpair ( a, ) equals the number of times x 2 + cxx + p is 
the characteristic polynomial of a 2-by-2 matrix on the diagonal of this 
matrix (from 9.9). Because the diagonal of this matrix has length dim V, 
the sum of the multiplicities of all the eigenvalues of T plus the sum of 
twice the multiplicities of all the eigenpairs of T must equal dim V. ■ 


Suppose V is a real vector space and T e £(V). With respect to 
some basis of V, T has a block upper-triangular matrix of the form 


9.18 


A\ * 

0 A m 


where each Aj is a 1-by-l matrix or a 2-by-2 matrix with no eigenval¬ 
ues (see 9.4). We define the characteristic polynomial of T to be the 
product of the characteristic polynomials of A\, ..., A m . Explicitly, for 
each j, define qj e ?(R) by 


9.19 qj(x) 


x-A if Aj equals [A]; 

(x - a)(x - d) - be if A j equals [ % c d ]. 


Then the characteristic polynomial of T is 


qi(x) ...q m (x). 


Clearly the characteristic polynomial of T has degree dim V. Fur¬ 
thermore, 9.9 insures that the characteristic polynomial of T depends 
only on T and not on the choice of a particular basis. 
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Now we can prove a result that was promised in the last chapter, 
where we proved the analogous theorem (8.20) for operators on com¬ 
plex vector spaces. 

9.20 Cayley-Hamilton Theorem: Suppose V is a real vector space 
and T G £(V). Let q denote the characteristic polynomial of T. Then 
q(T) = 0. 


Proof: Choose a basis of V with respect to which T has a block 
upper-triangular matrix of the form 9.18, where each Aj is a 1-by-l 
matrix or a 2-by-2 matrix with no eigenvalues. Suppose Uj is the one- or 
two-dimensional subspace spanned by the basis vectors corresponding 
to Aj. Define qj as in 9.19. To prove that q(T) = 0, we need only show 
that q(T)\uj = 0 for j = 1 ,,m. To do this, it suffices to show that 


This proof uses the 
same ideas as the proof 
of the analogous result 
on complex vector 
spaces (8.20). 


9.21 


qi(T)... qj(T)\uj = 0 


for j = 1,... ,m. 

We will prove 9.21 by induction on j. To get started, suppose that 
j = 1. Because LM(T) is given by 9.18, we have qiiT)]^ = 0 (obvious if 
dim Ui = 1; from 9.7(a) if dimf/i = 2), giving 9.21 when j = 1. 

Now suppose that 1 < j < n and that 


0 = qi(T)\u 1 

0 = qi(T)q2(T)\u 2 


0 = q\(T) ...qj-i(T)\u J _ 1 . 
If v G Uj, then from 9.18 we see that 


qj(T)v = u + qj(S)v, 

where u G U\ + ■ ■ ■ + Uj-i and S G L(Uj ) has characteristic poly¬ 
nomial qj. Because qj(S) = 0 (obvious if dim Uj = 1; from 9.7(a) if 
dim Uj = 2), the equation above shows that 

qj(T)v G Ui + ■ ■ ■ + Uj -1 

whenever v G Uj. Thus, by our induction hypothesis, q\(T)... qj-i(T) 
applied to qj(T)v gives 0 whenever v G Uj. In other words, 9.21 holds, 
completing the proof. ■ 
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Either m or M 
might be 0. 


This proof uses the 
same ideas as the proof 
of the analogous result 
on complex vector 
spaces (8.23). 


Suppose V is a real vector space and T e L(V). Clearly the Cayley- 
Hamilton theorem (9.20) implies that the minimal polynomial of T has 
degree at most dim V, as was the case on complex vector spaces. If 
the degree of the minimal polynomial of T equals dim V 7 , then, as was 
also the case on complex vector spaces, the minimal polynomial of T 
must equal the characteristic polynomial of T. This follows from the 
Cayley-Hamilton theorem (9.20) and 8.34. 

Finally, we can now prove a major structure theorem about oper¬ 
ators on real vector spaces. The theorem below should be compared 
to 8.23, the corresponding result on complex vector spaces. 

9.22 Theorem: Suppose V is a real vector space and T e L(V). Let 
Ai,..., A m be the distinct eigenvalues of T, with U\,..., U m the corre¬ 
sponding sets of generalized eigenvectors. Let (ai, /Si), - - -, (« m , Pm ) 
be the distinct eigenpairs of T and letVj = null(T 2 + ctjT + /i ; /) d " nv . 
Then 

(a) V = Ui © ■ ■ ■ © U m © Vi © ■ ■ ■ © v M ; 

(b) each Uj and each Vj is invariant under T; 

(c) each ( T - A jl)\uj and each (T 2 + tx/T + fijl) \ Vj is nilpotent. 

Proof: From 8.22, we get (b). Clearly (c) follows from the defini¬ 
tions. 

To prove (a), recall that dim Uj equals the multiplicity of A j as an 
eigenvalue of T and dim Vj equals twice the multiplicity of (/I ; ) as 
an eigenpair of T. Thus 

9.23 dim V = dimf/i + ■ ■ ■ + dim[/ m + dimVi + ■ ■ ■ + Vm\ 

this follows from 9.17. Let U = U\ + ■ ■ ■ + U m + V\ + ■ ■ ■ + Vm- Note 
that U is invariant under T. Thus we can define S G £(U) by 

S = T\ V . 

Note that S has the same eigenvalues, with the same multiplicities, as T 
because all the generalized eigenvectors of T are in U, the domain of S. 
Similarly, S has the same eigenpairs, with the same multiplicities, as T. 
Thus applying 9.17 to 5, we get 


dim U = diml/i + ■ ■ ■ + dim U m + dim V j + ■ ■ ■ + Vm- 
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This equation, along with 9.23, shows that dim V 7 = dimf/. Because U 
is a subspace of V, this implies that V = U. In other words, 

V = Ui + ■ ■ ■ + Uyn + Vi + ■ ■ ■ + Vm- 

This equation, along with 9.23, allows us to use 2.19 to conclude that 
(a) holds, completing the proof. ■ 
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Clearly Exercise 4 is a 
stronger statement 
than Exercise 3. Even 
so, you may want to do 
Exercise 3 first because 
it is easier than 
Exercise 4. 


"Exercises 


1. 

2 . 


Prove that 1 is an eigenvalue of every square matrix with the 
property that the sum of the entries in each row equals 1. 


Consider a 2-by-2 matrix of real numbers 



Prove that A has an eigenvalue (in R) if and only if 


(a - d) 2 + 4 be > 0. 


3. 


4. 


5. 


6 . 


Suppose A is a block diagonal matrix 


Ai 


A = 


0 


0 

A™, 


where each Aj is a square matrix. Prove that the set of eigenval¬ 
ues of A equals the union of the eigenvalues of Ai,..., A m . 


Suppose A is a block upper-triangular matrix 

Ai * 

A = 

0 Ayy\ 


where each Aj is a square matrix. Prove that the set of eigenval¬ 
ues of A equals the union of the eigenvalues of A\, ..., A m . 


Suppose V is a real vector space and T G L(V). Suppose a, (3 gR 
are such that T 2 + cxT + fil = 0. Prove that T has an eigenvalue 
if and only if a 2 > 4p. 


Suppose V is a real inner-product space and T G L(V). Prove 
that there is an orthonormal basis of V with respect to which T 
has a block upper-triangular matrix 

f Ai * 1 


L 0 A m J 

where each Aj is a 1-by-l matrix or a 2-by-2 matrix with no eigen¬ 
values. 
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7. Prove that if T e L(V) and j is a positive integer such that 
j < dim V 7 , then T has an invariant subspace whose dimension 
equals j -1 or j. 

8. Prove that there does not exist an operator T e £(R 7 ) such that 
T 2 + T + I is nilpotent. 

9. Give an example of an operator T e X(C 7 ) such that T 2 + T + I 
is nilpotent. 

10. Suppose V is a real vector space and T e L(V). Suppose a, /5 e R 
are such that a 2 < 4/5. Prove that 

nulHT 2 + aT + /5 1) k 

has even dimension for every positive integer k. 

11. Suppose V is a real vector space and T e £(V). Suppose a, /5 e R 
are such that a 2 < 4/5 and T 2 + aT + /5 1 is nilpotent. Prove that 
dim V is even and 


( T 2 + aT + pi) dimV / 2 = o. 


12. Prove that if T e X(R 3 ) and 5, 7 are eigenvalues of T, then T has 
no eigenpairs. 

13. Suppose V is a real vector space with dim V = n and T e £(V) 
is such that 

null T n ' 2 T null T n ~ 1 . 

Prove that T has at most two distinct eigenvalues and that T has 
no eigenpairs. 


14. Suppose V is a vector space with dimension 2 and T e L(V). 
Prove that if 

a c 
b d 

is the matrix of T with respect to some basis of V, then the char¬ 
acteristic polynomial of T equals (z - a)(z - d) - be. 

15. Suppose V is a real inner-product space and 5 G L(V) is an isom¬ 
etry. Prove that if (a, 1 5) is an eigenpair of S, then /5 = 1. 


You do not need to find 
the eigenvalues of T to 
do this exercise. As 
usual unless otherwise 
specified, here V may 
be a real or complex 
vector space. 
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Throughout this book our emphasis has been on linear maps and op¬ 
erators rather than on matrices. In this chapter we pay more attention 
to matrices as we define and discuss traces and determinants. Deter¬ 
minants appear only at the end of this book because we replaced their 
usual applications in linear algebra (the definition of the characteris¬ 
tic polynomial and the proof that operators on complex vector spaces 
have eigenvalues) with more natural techniques. The book concludes 
with an explanation of the important role played by determinants in 
the theory of volume and integration. 

Recall that F denotes R or C. 

Also, V is a finite-dimensional, nonzero vector space over F. 

❖ 

❖ ❖ ❖ ❖ 

#T% 0T% #T% 
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Some mathematicians 
use the terms 
nonsingular, which 
means the same as 
invertible, and 
singular, which means 
the same as 
noninvertible. 
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Change of'Basis 

The matrix of an operator T e £(V) depends on a choice of basis 
of V. Two different bases of V may give different matrices of T. In this 
section we will learn how these matrices are related. This information 
will help us find formulas for the trace and determinant of T later in 
this chapter. 

With respect to any basis of V, the identity operator I e L(V) has a 
diagonal matrix 

" 1 0 ' 

0 1 

This matrix is called the identity matrix and is denoted I. Note that we 
use the symbol I to denote the identity operator (on all vector spaces) 

and the identity matrix (of all possible sizes). You should always be 

able to tell from the context which particular meaning of I is intended. 
For example, consider the equation 

on the left side I denotes the identity operator and on the right side I 
denotes the identity matrix. 

If A is a square matrix (with entries in F, as usual) with the same 
size as I, then AI = IA = A, as you should verify. A square matrix A 
is called invertible if there is a square matrix B of the same size such 
that AJS = BA = I, and we call B an inverse of A. To prove that A has 
at most one inverse, suppose B and B' are inverses of A. Then 

B = BI = B(AB') = (BA)B' = IB 1 = B ', 

and hence B = B', as desired. Because an inverse is unique, we can use 
the notation A -1 to denote the inverse of A (if A is invertible). In other 
words, if A is invertible, then A -1 is the unique matrix of the same size 
such that AA” 1 = A _1 A = I. 

Recall that when discussing linear maps from one vector space to 
another in Chapter 3, we defined the matrix of a linear map with respect 
to two bases—one basis for the first vector space and another basis for 
the second vector space. When we study operators, which are linear 
maps from a vector space to itself, we almost always use the same basis 
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for both vector spaces (after all, the two vector spaces in question are 
equal). Thus we usually refer to the matrix of an operator with respect 
to a basis, meaning that we are using one basis in two capacities. The 
next proposition is one of the rare cases where we need to use two 
different bases even though we have an operator from a vector space 
to itself. 

Let’s review how matrix multiplication interacts with multiplication 
of linear maps. Suppose that along with V we have two other finite¬ 
dimensional vector spaces, say U and W. Let (u i, ...,u p ) be a basis 

of U, let (vi,...,v M ) be a basis of V, and let (wj.w m ) be a basis 

of W. If T E £(U, V ) and 5 e £(V, W), then ST e £(U, W) and 

10.1 M{ST, (ui,...,u p ), (wi. w m )) = 

M{S, (Vi,..., v n ), (wi,..., w m ))M(T, (wi, ■ • •, Up ), (Vi,..., v n )). 

The equation above holds because we defined matrix multiplication to 
make it true—see 3.11 and the material following it. 

The following proposition deals with the matrix of the identity op¬ 
erator when we use two different bases. Note that the k th column of 
M(I, (u i,... ,u n ), (Vi,..., v n )) consists of the scalars needed to write 
Uk as a linear combination of the v’s. As an example of the proposi¬ 
tion below, consider the bases ((4, 2), (5, 3)) and ((1, 0), (0,1)) ofF 2 . 
Obviously 

2M(/, ((4,2), (5,3)), ((1,0), (0,1))) = 

The inverse of the matrix above is [ ^ V 2 j, as you should verify. Thus 
the proposition below implies that 

CM (j, ((1,0), (0, D), ((4,2), (5,3))) = 

10.2 Proposition: If (in ,..., u n ) and (vi,..., v n ) are bases of V, 
then M(I, (u\, ...,u n ), (vi,..., v n )) is invertible and 

M(I, (ui,...,u n ), (vi,...,v n )) _1 = M(I, (vi,...,v M ), (uu n )). 

Proof: In 10.1, replace U and W with V, replace wj with uj, and 
replace S and T with I, getting 


3/2 -5/2 
-1 2 


4 5 
2 3 
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I = M{I, (vi,...,v n ), (ui,...,u n ))M(I, (ui,...,u n ), (vi,...,v n )). 

Now interchange the roles of the u’s and v’s, getting 

/ = M{I, (ui,...,u n ), (Vi ,...,v n ))M(I, (Vi.v„), (ui,... ,u n )). 

These two equations give the desired result. ■ 

Now we can see how the matrix of T changes when we change 
bases. 

10.3 Theorem: Suppose T e £(V). Let (ui,..., u n ) and (Vi, ... ,v n ) 
be bases of V. Let A = M(I, (u i, ...,u n ), (Vi,..., v n )). Then 

10.4 M(T,(ui,...,u n )) = A~ l M{T, (vi,...,v n ))A. 

Proof: In 10.1, replace U and W with V, replace Wj with vj, replace 
T with I, and replace S with T, getting 

10.5 M(T, (ui,...,u n ), (vi,...,v n )) = M(T,(vi,...,v n ))A. 

Again use 10.1, this time replacing U and W with V, replacing wj 
with Uj, and replacing S with /, getting 

M{T, = A^ X M{T, (ui,...,u n ), (vi,...,v n )), 

where we have used 10.2. Substituting 10.5 into the equation above 
gives 10.4, completing the proof. ■ 

Trace 

Let’s examine the characteristic polynomial more closely than we 
did in the last two chapters. If V is an n -dimensional complex vector 
space and T G £(V), then the characteristic polynomial of T equals 


(z-Ai)...(z-A n ), 

where Ai,..., A n are the eigenvalues of T, repeated according to multi¬ 
plicity. Expanding the polynomial above, we can write the characteristic 
polynomial of T in the form 


10.6 


z n - (Ai + ■ ■ ■ + A n)z n ~ l + - ■ ■ + (-l) n (Ai...A n ). 



Trace 


217 


If V is an rz-dimensional real vector space and T G £(V), then the 
characteristic polynomial of T equals 

(x - Ai)... (x - A m )(x 2 + aix + pi) ...(x 2 + a M x + p M ), 

where Ai,..., A m are the eigenvalues of T and («i, Pi),.. . , ( cxm , Pm) are 
the eigenpairs of T, each repeated according to multiplicity. Expanding 
the polynomial above, we can write the characteristic polynomial of T 
in the form 


10.7 x n - (Ai + ■ ■ ■ + A m - cxi - ■ ■ ■ - a m )x n 1 + ... 

+ (-l) m (A!... A m p 1 ...p M ). 


In this section we will study the coefficient of z n ~ l (usually denoted 
x n_1 when we are dealing with a real vector space) in the characteristic 
polynomial. In the next section we will study the constant term in the 
characteristic polynomial. 

For T G £(V), the negative of the coefficient of z n_1 (orx n_1 for real 
vector spaces) in the characteristic polynomial of T is called the trace 
of T, denoted trace T. If V is a complex vector space, then 10.6 shows 
that trace T equals the sum of the eigenvalues of T, counting multiplic¬ 
ity. If V is a real vector space, then 10.7 shows that trace T equals the 
sum of the eigenvalues of T minus the sum of the first coordinates of 
the eigenpairs of T, each repeated according to multiplicity. 

For example, suppose T G £(C?) is the operator whose matrix is 


10.8 


"3 -1 -2 ' 
3 2-3 

12 0 


Then the eigenvalues of T are 1, 2 + 3i, and 2 - 3i, each with multi¬ 
plicity 1, as you can verify. Computing the sum of the eigenvalues, we 
have trace T = 1 + (2 + 3i) + (2 - 3i); in other words, trace T = 5. 

As another example, suppose T G £(R 3 ) is the operator whose ma¬ 
trix is also given by 10.8 (note that in the previous paragraph we were 
working on a complex vector space; now we are working on a real vec¬ 
tor space). Then 1 is the only eigenvalue of T (it has multiplicity 1) 
and (-4,13) is the only eigenpair of T (it has multiplicity 1), as you 
should have verified in the last chapter (see page 205). Computing the 
sum of the eigenvalues minus the sum of the first coordinates of the 
eigenpairs, we have trace T = 1 - (-4); in other words, trace T = 5. 


Here m or M might 
equal 0. 

Recall that a pair (a, P) 
of real numbers is an 
eigenpair of T if 
a 2 < 4/1 and 
T 2 + <xT + pi is not 
injective. 


Note that trace T 
depends only on T and 
not on a basis of V 
because the 
characteristic 
polynomial of T does 
not depend on a choice 
of basis. 
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You should carefully 
review 9.9 to 
understand the 
relationship between 
eigenpairs and 
characteristic 
polynomials of 2 -by-2 
blocks. 
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The reason that the operators in the two previous examples have 
the same trace will become clear after we find a formula (valid on both 
complex and real vector spaces) for computing the trace of an operator 
from its matrix. 

Most of the rest of this section is devoted to discovering how to cal¬ 
culate trace T from the matrix of T (with respect to an arbitrary basis). 
Let’s start with the easiest situation. Suppose V is a complex vector 
space, T e £(V), and we choose a basis of V with respect to which 
T has an upper-triangular matrix A. Then the eigenvalues of T are 
precisely the diagonal entries of A, repeated according to multiplicity 
(see 8.10). Thus tracer equals the sum of the diagonal entries of A. 
The same formula works for the operator T e £(F 3 ) whose matrix is 
given by 10.8 and whose trace equals 5. Could such a simple formula 
be true in general? 

We begin our investigation by considering T e £(V) where V is a 
real vector space. Choose a basis of V with respect to which T has a 
block upper-triangular matrix M(T), where each block on the diagonal 
is a 1-by-l matrix containing an eigenvalue of T or a 2-by-2 block with 
no eigenvalues (see 9.4 and 9.9). Each entry in a 1-by-l block on the 
diagonal of M(T) is an eigenvalue of T and thus makes a contribution 
to tracer. If M(T) has any 2-by-2 blocks on the diagonal, consider a 
typical one 

a c 
b d 

The characteristic polynomial of this 2-by-2 matrix is ( x-a)(x-d)-bc , 
which equals 

x 2 - (a + d)x + (ad - be). 

Thus (-a - d, ad - be) is an eigenpair of T. The negative of the first 
coordinate of this eigenpair, namely, a + d, is the contribution of this 
block to trace T. Note that a + d is the sum of the entries on the di¬ 
agonal of this block. Thus for any basis of V with respect to which 
the matrix of T has the block upper-triangular form required by 9.4 
and 9.9, trace T equals the sum of the entries on the diagonal. 

At this point you should suspect that trace T equals the sum of 
the diagonal entries of the matrix of T with respect to an arbitrary 
basis. Remarkably, this turns out to be true. To prove it, let’s de¬ 
fine the trace of a square matrix A, denoted trace A, to be the sum 
of the diagonal entries. With this notation, we want to prove that 
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tracer = trace M(T, (vi,... ,v n )), where (vi,...,v M ) is an arbitrary 
basis of V. We already know this is true if (vi,..., v n ) is a basis with 
respect to which T has an upper-triangular matrix (if V is complex) or 
an appropriate block upper-triangular matrix (if V is real). We will need 
the following proposition to prove our trace formula for an arbitrary 
basis. 

10.9 Proposition: If A and B are square matrices of the same size, 
then 

trace(AB) = trace(BA). 

Proof: Suppose 



0-1,1 ■ 

O-i,n 


hi,i .. 

bl,n 

A = 



, B = 




O-n, 1 

O n ,n 


bn, 1 

bn,n 


The j 1 h term on the diagonal of AB equals 

n 

X a j,kb k j. 

k =1 

Thus 

n n 

trace (AB) = X X a if b kj 
j=ik=i 
n n 

= X X bkjcij'k 

k=ij=i 

n 

= X k th term on the diagonal of BA 

k =l 

= trace(BA), 

as desired. ■ 

Now we can prove that the sum of the diagonal entries of the matrix 
of an operator is independent of the basis with respect to which the 
matrix is computed. 

10.10 Corollary: Suppose T e £(V). If (ui,... ,u n ) and (vi,... ,v rt ) 
are bases of V, then 

trace M(T, (ui,...,u n )) = trace M(T, (vi,..., v„)). 
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Proof: Suppose (ui,...,u n ) and (vi,...,v n ) are bases of V. Let 
A = M(I, (ui,...,u n ), (vi,...,v n )). Then 


The third equality here 
depends on the 
associative property of 
matrix multiplication. 


trac efM(T, (ui,...,u n )) = trace(A 1 (M(T, (Vi, ..., v„))A)) 

= trace((31 (T, (vi,...,v n ))A)A“ 1 ) 
= trace31(7, (Vi, ..., v„)), 


where the hrst equality follows from 10.3 and the second equality fol¬ 
lows from 10.9. The third equality completes the proof. ■ 

The theorem below states that the trace of an operator equals the 
sum of the diagonal entries of the matrix of the operator. This theorem 
does not specify a basis because, by the corollary above, the sum of 
the diagonal entries of the matrix of an operator is the same for every 
choice of basis. 

10.11 Theorem: If T e £(V), then trace T = trace3f(T). 

Proof: Let T e £(V). As noted above, trace3f(T) is independent 
of which basis of V we choose (by 10.10). Thus to show that 

tracer = trace M (T) 

for every basis of V, we need only show that the equation above holds 
for some basis of V. We already did this (on page 218), choosing a basis 
of V with respect to which M(T) is an upper-triangular matrix (if V is a 
complex vector space) or an appropriate block upper-triangular matrix 
(if V is a real vector space). ■ 

If we know the matrix of an operator on a complex vector space, the 
theorem above allows us to find the sum of all the eigenvalues without 
finding any of the eigenvalues. For example, consider the operator 
on C 5 whose matrix is 


0000-3 
1 0 0 0 6 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 
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No one knows an exact formula for any of the eigenvalues of this op¬ 
erator. However, we do know that the sum of the eigenvalues equals 0 
because the sum of the diagonal entries of the matrix above equals 0. 

The theorem above also allows us easily to prove some useful prop¬ 
erties about traces of operators by shifting to the language of traces 
of matrices, where certain properties have already been proved or are 
obvious. We carry out this procedure in the next corollary. 

10.12 Corollary: If S,T e £(V), then 

trace (5 T) = trace (TS) and trace (5 + T) = trace 5 + tracer. 

Proof: Suppose S, T e L(V). Choose any basis of V. Then 

trace(ST) = trace M{ST) 

= trac 

= trac e{M(T)M(S)) 

= trac eM(TS) 

= trace(TS), 

where the first and last equalities come from 10.11 and the middle 
equality comes from 10.9. This completes the proof of the first asser¬ 
tion in the corollary. 

To prove the second assertion in the corollary, note that 

trace(S + T) = trace M{S + T) 

= tracers) + M(T)) 

= trace M(S) + trace M(T) 

= trace 5 + trace T, 

where again the first and last equalities come from 10.11; the third 
equality is obvious from the definition of the trace of a matrix. This 
completes the proof of the second assertion in the corollary. ■ 

The techniques we have developed have the following curious corol¬ 
lary. The generalization of this result to infinite-dimensional vector 
spaces has important consequences in quantum theory. 
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The statement of this 
corollary does not 
involve traces, though 
the short proof uses 
traces. Whenever 
something like this 
happens in 
mathematics, we can be 
sure that a good 
definition lurks in the 
background. 


Note that det T 
depends only on T and 
not on a basis of V 
because the 
characteristic 
polynomial of T does 
not depend on a choice 
of basis. 


10.1 3 Corollary: There do not exist operators S, T G L(V) such that 
ST -TS = I. 

Proof: Suppose 5, T e £(V). Then 

trace(ST - TS) = trace(ST) - trace(TS) 

= 0 , 

where the second equality comes from 10.12. Clearly the trace of I 
equals dim V 7 , which is not 0. Because ST - TS and I have different 
traces, they cannot be equal. ■ 

Determinant of an Operator 

For T G £(V), we define the determinant of T, denoted deCT, to 
be ( —l) dimV times the constant term in the characteristic polynomial 
of T. The motivation for the factor (-l) c1imV in this definition comes 
from 10.6. 

If V is a complex vector space, then det T equals the product of 
the eigenvalues of T, counting multiplicity; this follows immediately 
from 10.6. Recall that if V is a complex vector space, then there is 
a basis of V with respect to which T has an upper-triangular matrix 
(see 5.13); thus det T equals the product of the diagonal entries of this 
matrix (see 8.10). 

If V is a real vector space, then detT equals the product of the 
eigenvalues of T times the product of the second coordinates of the 
eigenpairs of T, each repeated according to multiplicity—this follows 
from 10.7 and the observation that m = dim V - 2 M (in the notation 
of 10.7), and hence (-l) m = (-l) dimV . 

For example, suppose T G £(C 3 ) is the operator whose matrix is 
given by 10.8. As we noted in the last section, the eigenvalues of T are 
1,2 + 3 i, and 2 - 3 i, each with multiplicity 1. Computing the product 
of the eigenvalues, we have detT = (1) (2 + 3i) (2 - 3i); in other words, 
detT = 13. 

As another example, suppose T e £(R 3 ) is the operator whose ma¬ 
trix is also given by 10.8 (note that in the previous paragraph we were 
working on a complex vector space; now we are working on a real vec¬ 
tor space). Then, as we noted earlier, 1 is the only eigenvalue of T (it 
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has multiplicity 1) and (-4,13) is the only eigenpair of T (it has multi¬ 
plicity 1). Computing the product of the eigenvalues times the product 
of the second coordinates of the eigenpairs, we have detT = (1)(13); 
in other words, det T = 13. 

The reason that the operators in the two previous examples have the 
same determinant will become clear after we find a formula (valid on 
both complex and real vector spaces) for computing the determinant 
of an operator from its matrix. 

In this section, we will prove some simple but important properties 
of determinants. In the next section, we will discover how to calculate 
det T from the matrix of T (with respect to an arbitrary basis). We begin 
with a crucial result that has an easy proof with our approach. 

10.14 Proposition: An operator is invertible if and only if its deter¬ 
minant is nonzero. 

Proof: First suppose V is a complex vector space and T e £(V). 
The operator T is invertible if and only if 0 is not an eigenvalue of T. 
Clearly this happens if and only if the product of the eigenvalues of T 
is not 0. Thus T is invertible if and only if det T f 0, as desired. 

Now suppose V is a real vector space and T e £(V). Again, T is 
invertible if and only if 0 is not an eigenvalue of T. Using the notation 
of 10.7, we have 

10.15 det T = Ai... \ m Pi ■ ■ ■ Pm, 

where the A’s are the eigenvalues of T and the j8’s are the second coor¬ 
dinates of the eigenpairs of T, each repeated according to multiplicity. 
For each eigenpair ( /I,), we have cXj 2 < 4 fij. In particular, each /I ; 
is positive. This implies (see 10.15) that Ai ...\ m f 0 if and only if 
det T f 0. Thus T is invertible if and only if det T f 0, as desired. ■ 

If T e £(V) and A, z e F, then A is an eigenvalue of T if and only if 
z - A is an eigenvalue of zl - T. This follows from 

-(T-AI) = (zl -T) - (z-A)J. 

Raising both sides of this equation to the dim V power and then taking 
null spaces of both sides shows that the multiplicity of A as an eigen¬ 
value of T equals the multiplicity of z - A as an eigenvalue of zl - T. 
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Real vector spaces are 
harder to deal with 
than complex vector 
spaces. The first time 
you read this chapter, 
you may want to 
concentrate on the 
basic ideas by 
considering only 
complex vector spaces 
and ignoring the 
special procedures 
needed to deal with 
real vector spaces. 


The next le mm a gives the analogous result for eigenpairs. We will use 
this lemma to show that the characteristic polynomial can be expressed 
as a certain determinant. 

10.16 Lemma: Suppose V is a real vector space, T e £(V), and 
a, p,x <E R with a 2 < 4p. Then (a, P) is an eigenpair of T if and only 
if {-2x - a, x 2 + ax + /l) is an eigenpair of xl - T. Furthermore, these 
eigenpairs have the same multiplicities. 

Proof: First we need to check that (-2 x - a, x 2 + ax + p ) satisfies 
the inequality required of an eigenpair. We have 

(-2x - a) 2 = 4x 2 + 4 ax + a 2 
< 4x 2 + 4ax + 4/1 
= 4(x 2 + ax + P). 

Thus (-2 x - a, x 2 + ax + ft) satisfies the required inequality. 

Now 

T 2 + aT + pi = (xl - T) 2 - (2x + a)(xl - T) + (x 2 + ax + P)I, 

as you should verify. Thus (a,P) is an eigenpair of T if and only if 
(-2 x - a,x 2 + ax + P) is an eigenpair of xl - T. Furthermore, raising 
both sides of the equation above to the dimV power and then taking 
null spaces of both sides shows that the multiplicities are equal. ■ 

Most textbooks take the theorem below as the definition of the char¬ 
acteristic polynomial. Texts using that approach must spend consider¬ 
ably more time developing the theory of determinants before they get 
to interesting linear algebra. 

10.17 Theorem: Suppose T e £(V). Then the characteristic poly¬ 
nomial of T equals det(z/ - T). 

Proof: First suppose V is a complex vector space. Let Ai,..., A n 
denote the eigenvalues of T, repeated according to multiplicity. Thus 
for z e C, the eigenvalues of zl - T are z-Ai,...,z-A n , repeated 
according to multiplicity. The determinant of zl - T is the product of 
these eigenvalues. In other words, 


det(zJ - T) = (z - Ai)... (z - A„). 
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The right side of the equation above is, by definition, the characteristic 
polynomial of T, completing the proof when V is a complex vector 
space. 

Now suppose V is a real vector space. Let Ai,...,A m denote the 
eigenvalues of T and let (oq, Pi ),..., («m, Pm) denote the eigenpairs 
of T, each repeated according to multiplicity. Thus for x e R, the 
eigenvalues of xl-T are x-Ai,... ,x-A m and, by 10.16, the eigenpairs 
of xl - T are 

(-2x - oq,x 2 + oqx + j8i),.(-2x - (Xm,x 2 + (XmX + Pm), 

each repeated according to multiplicity. Hence 

det (xl - T) = (x — Ai)... (x — A m )(x 2 + oqx + Pi )... (x 2 + (XmX + Pm)- 

The right side of the equation above is, by definition, the characteristic 
polynomial of T, completing the proof when V is a real vector space. ■ 

Determinant of a Matrix 

Most of this section is devoted to discovering how to calculate det T 
from the matrix of T (with respect to an arbitrary basis). Let’s start with 
the easiest situation. Suppose V is a complex vector space, T G £{V), 
and we choose a basis of V with respect to which T has an upper- 
triangular matrix. Then, as we noted in the last section, det T equals 
the product of the diagonal entries of this matrix. Could such a simple 
formula be true in general? 

Unfortunately the determinant is more complicated than the trace. 
In particular, det T need not equal the product of the diagonal entries 
of M(T) with respect to an arbitrary basis. For example, the operator 
on F 3 whose matrix equals 10.8 has determinant 13, as we saw in the 
last section. However, the product of the diagonal entries of that matrix 
equals 0. 

For each square matrix A, we want to define the determinant of A, 
denoted det A, in such a way that det T = det M(T) regardless of which 
basis is used to compute M(T). We begin our search for the correct def¬ 
inition of the determinant of a matrix by calculating the determinants 
of some special operators. 

Let ci. c„eFbe nonzero scalars and let (vi,..., v n ) be a basis 

of V. Consider the operator T e £(V) such that M(T, (vi,..., v n )) 
equals 
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10.18 


0 C n 

ci 0 

C2 0 


Cn- 1 0 


here all entries of the matrix are 0 except for the upper-right corner 
and along the line just below the diagonal. Let’s find the determinant 
of T. Note that 


(Vl, TV l, T 2 V i, . . . , T n Vi) = (Vl, CiV 2 , C1C2V3, . . . , Cl . . . Cn-lVn)- 


Recall that if the 
minimal polynomial of 
an operator T e £(V) 
has degree dim V, then 
the characteristic 
polynomial of T equals 
the minimal polynomial 
of T. Computing the 
minimal polynomial is 
often an efficient 
method of finding the 
characteristic 
polynomial. 


Thus (vi, Tvi,..., T n_1 Vi) is linearly independent (the c’s are all non¬ 
zero). Hence if p is a nonzero polynomial with degree at most n - 1, 
then p(T)v\ ± 0. In other words, the minimal polynomial of T cannot 
have degree less than n. As you should verify, T n Vj = c\... c n Vj for 
each j, and hence T n = ci... c n l■ Thus z n - c\... c n is the minimal 
polynomial of T. Because n = dimV, we see that z n - ci... c n is also 
the characteristic polynomial of T. Multiplying the constant term of 
this polynomial by (-1) M , we get 

10.19 detT= (-l) w “ 1 c 1 ...c n . 

If some Cj equals 0, then clearly T is not invertible, so det T = 0 and 
the same formula holds. Thus in order to have detT = det M(T), we 
will have to make the determinant of 10.18 equal to (-l) n ~ l c \... c n . 
However, we do not yet have enough evidence to make a reasonable 
guess about the proper definition of the determinant of an arbitrary 
square matrix. 

To compute the determinants of a more complicated class of op¬ 
erators, we introduce the notion of permutation. A permutation of 

(l,...,n) is a list (mi.m„) that contains each of the numbers 

1,..., n exactly once. The set of all permutations of (1,..., n) is de¬ 
noted permn. For example, (2, 3,..., n, 1) e permn. You should think 
of an element of perm n as a rearrangement of the first n integers. 

For simplicity we will work with matrices with complex entries (at 
this stage we are providing only motivation—formal proofs will come 
later). Let ci,..., c n e C and let (vi,..., v n ) be a basis of V, which 
we are assuming is a complex vector space. Consider a permutation 
(pi, ..., p n ) e permn that can be obtained as follows: break (1,..., n) 
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into lists of consecutive integers and in each list move the first term to 
the end of that list. For example, taking n = 9, the permutation 

10.20 (2,3,1,5,6,7,4,9,8) 

is obtained from (1, 2, 3), (4, 5, 6, 7), (8, 9) by moving the first term of 
each of these lists to the end, producing (2, 3,1), (5, 6, 7,4), (9, 8), and 
then putting these together to form 10.20. LetT e £(V) be the operator 
such that 


10.21 Tv k = c k v Pk 

for k = 1 ,,n. We want to find a formula for det T. This generalizes 
our earlier example because if {pi,...,p n ) happens to be the permuta¬ 
tion (2, 3,..., n, 1), then the operator T whose matrix equals 10.18 is 
the same as the operator T defined by 10.21. 

With respect to the basis (vi,... ,v n ), the matrix of the operator T 
defined by 10.21 is a block diagonal matrix 


A = 


Ai 


Am 


where each block is a square matrix of the form 10.18. The eigenvalues 
of T equal the union of the eigenvalues of Ai,..., Am (see Exercise 3 in 
Chapter 9). Recalling that the determinant of an operator on a complex 
vector space is the product of the eigenvalues, we see that our definition 
of the determinant of a square matrix should force 


detA = (detAi)... (det Am). 


However, we already know how to compute the determinant of each Aj, 
which has the same form as 10.18 (of course with a different value of n). 
Putting all this together, we see that we should have 

detA = (-l)' 11 ^ 1 ... (-l)" M_1 ci... c n , 

where Aj has size nj-by-n/. The number (-l) ni_1 ... (-i) n M-i i s called 
the sign of the permutation (pi,..., p n ), denoted sign(pi,..., p n ) (this 
is a temporary definition that we will change to an equivalent definition 
later, when we define the sign of an arbitrary permutation). 



228 


Chapter 10. Trace and Determinant 


To put this into a form that does not depend on the particular per¬ 
mutation (pi,..., p n ), let a.j t k denote the entry in row j, column k, of A; 
thus 


a j,k 


o if j + p k ; 

Cfc if j = Pk- 


Then 


10.22 


detA = 


X (sign(m 1 ,...,m n ))a Wl>1 

(mi.m„)epermn 


i 


Some texts use the 
unnecessarily fancy 
term signwn, which 
means the same 
as sign. 


because each summand is 0 except the one corresponding to the per¬ 
mutation (pi,...,p n ). 

Consider now an arbitrary matrix A with entry ajy in row j, col¬ 
umn k. Using the paragraph above as motivation, we guess that detA 
should be defined by 10.22. This will turn out to be correct. We can 
now dispense with the motivation and begin the more formal approach. 
First we will need to define the sign of an arbitrary permutation. 

The sign of a permutation (mi,...,m n ) is defined to be 1 if the 
number of pairs of integers (j , k) with 1 < j < k < n such that j ap¬ 
pears after k in the list (mi,.. ., m n ) is even and -1 if the number of 
such pairs is odd. In other words, the sign of a permutation equals 1 if 
the natural order has been changed an even number of times and equals 
-1 if the natural order has been changed an odd number of times. For 
example, in the permutation (2, 3,..., n, 1) the only pairs (j , k) with 
j < k that appear with changed order are (l,2),(l,3),...,(l,n); be¬ 
cause we have n - 1 such pairs, the sign of this permutation equals 
(_l)n-t ( no t e that the same quantity appeared in 10.19). 

The permutation (2,1, 3,4), which is obtained from the permutation 
(1, 2, 3,4) by interchanging the first two entries, has sign -1. The next 
lemma shows that interchanging any two entries of any permutation 
changes the sign of the permutation. 


10.23 Lemma: Interchanging two entries in a permutation multiplies 
the sign of the permutation by -1. 


Proof: Suppose we have two permutations, where the second per¬ 
mutation is obtained from the first by interchanging two entries. If the 
two entries that we interchanged were in their natural order in the first 
permutation, then they no longer are in the second permutation, and 
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vice versa, for a net change (so far) of 1 or -1 (both odd numbers) in 
the number of pairs not in their natural order. 

Consider each entry between the two interchanged entries. If an in¬ 
termediate entry was originally in the natural order with respect to the 
first interchanged entry, then it no longer is, and vice versa. Similarly, 
if an intermediate entry was originally in the natural order with respect 
to the second interchanged entry, then it no longer is, and vice versa. 
Thus the net change for each intermediate entry in the number of pairs 
not in their natural order is 2, 0, or -2 (all even numbers). 

For all the other entries, there is no change in the number of pairs 
not in their natural order. Thus the total net change in the number of 
pairs not in their natural order is an odd number. Thus the sign of the 
second permutation equals -1 times the sign of the first permutation. ■ 


If A is an n-by-n matrix 


10.24 


0 - 1,1 


A = 


On, 1 


0\ ,n 


On,n 


then the determinant of A, denoted det A, is defined by 

10.25 det A = ^ (sign(mi,..., m„))o mi ,i... o TOll ,„. 

(mi .m„)Epermn 


Our motivation for this 
definition comes 
from 10.22. 


For example, if A is the 1-by-l matrix [oi,i], then det A = oi, i be¬ 
cause perml has only one element, namely, (1), which has sign 1. For 
a more interesting example, consider a typical 2-by-2 matrix. Clearly 
perm2 has only two elements, namely, (1,2), which has sign 1, and 
(2,1), which has sign -1. Thus 


10.26 


det 


Ol,l 0 - 1,2 
02,1 02,2 


= 0i,i02,2 — 02,l0l,2- 


To make sure you understand this process, you should now find the 
formula for the determinant of the 3-by-3 matrix 


o 1,1 

Ol ,2 

0 1,3 

02,1 

02,2 

02,3 

03,1 

03,2 

O 3,3 


using just the definition given above (do this even if you already know 
the answer). 


The set perm 3 
contains 6 elements. In 
general, perm n 
contains n\ elements. 
Note that n\ rapidly 
grows large as n 
increases. 
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Let’s compute the determinant of an upper-triangular matrix 


A = 


0 - 1,1 

0 


* 


On,n 


The permutation (1, 2,..., n) has sign 1 and thus contributes a term 
of o-ip ■ ■ ■ »n,n to the sum 10.25 defining det A. Any other permutation 

(nti.m n ) e permrz contains at least one entry ray with mj > j, 

which means that a mj j = 0 (because A is upper triangular). Thus all 
the other terms in the sum 10.25 defining det A make no contribu¬ 
tion. Hence det A = oi,j... o n , M . In other words, the determinant of an 
upper-triangular matrix equals the product of the diagonal entries. In 
particular, this means that if V is a complex vector space, T G £(V), 
and we choose a basis of V with respect to which M(T) is upper trian¬ 
gular, then det T = det M(T). Our goal is to prove that this holds for 
every basis of V, not just bases that give upper-triangular matrices. 

Generalizing the computation from the paragraph above, next we 
will show that if A is a block upper-triangular matrix 


Ai 


A = 


0 


* 


A 


m 


where each Aj is a 1-by-l or 2-by-2 matrix, then 
10.27 detA = (detAi)... (det A m ). 


To prove this, consider an element of permn. If this permutation 
moves an index corresponding to a 1-by-l block on the diagonal any¬ 
place else, then the permutation makes no contribution to the sum 
10.25 defining detA (because A is block upper triangular). For a pair 
of indices corresponding to a 2-by-2 block on the diagonal, the permu¬ 
tation must either leave these indices fixed or interchange them; oth¬ 
erwise again the permutation makes no contribution to the sum 10.25 
defining detA (because A is block upper triangular). These observa¬ 
tions, along with the formula 10.26 for the determinant of a 2-by-2 ma¬ 
trix, lead to 10.27. In particular, if V is a real vector space, T G £(V), 
and we choose a basis of V with respect to which M(T) is a block 
upper-triangular matrix with 1-by-l and 2-by-2 blocks on the diagonal 
as in 9.9, then det T = det M(T). 
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Our goal is to prove that det T = det M{T) for every T e £(V) and 
every basis of V. To do this, we will need to develop some proper¬ 
ties of determinants of matrices. The lemma below is the first of the 
properties we will need. 

10.28 Lemma: Suppose A is a square matrix. If B is the matrix 
obtained from A by interchanging two columns, then 


An entire book could 
be devoted just to 
deriving properties of 
determinants. 
Fortunately we need 
only a few of the basic 
properties. 


det A = - detH. 


Proof: Suppose A is given by 10.24 and B is obtained from A by 
interchanging two columns. Think of the sum 10.25 defining det A and 
the corresponding sum defining det B. The same products of a’s appear 
in both sums, though they correspond to different permutations. The 
permutation corresponding to a given product of a’s when computing 
det B is obtained by interchanging two entries in the corresponding 
permutation when computing det A, thus multiplying the sign of the 
permutation by -1 (see 10.23). Hence det A = - det B. m 

If T G £(V) and the matrix of T (with respect to some basis) has two 
equal columns, then T is not injective and hence det T = 0. Though 
this comment makes the next lemma plausible, it cannot be used in the 
proof because we do not yet know that det T = det M(T). 

10.29 Lemma: If A is a square matrix that has two equal columns, 
then det A = 0. 


Proof: Suppose A is a square matrix that has two equal columns. 
Interchanging the two equal columns of A gives the original matrix A. 
Thus from 10.28 (with B = A), we have det A = - det A, which implies 
that det A = 0. ■ 

This section is long, so let’s pause for a paragraph. The symbols ^ 
that appear on the first page of each chapter are decorations intended 
to take up space so that the first section of the chapter can start on the 
next page. Chapter 1 has one of these symbols, Chapter 2 has two of 
them, and so on. The symbols get smaller with each chapter. What you 
may not have noticed is that the sum of the areas of the symbols at the 
beginning of each chapter is the same for all chapters. For example, the 
diameter of each symbol at the beginning of Chapter 10 equals 1 / y'TO 
times the diameter of the symbol in Chapter 1. 
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Some texts define the 
determinant to be the 
function defined on the 
square matrices that is 
linear as a function of 
each column separately 
and that satisfies 10.30 
and det/ = 1 .To prove 
that such a function 
exists and that it is 
unique takes a 
nontrivial amount of 
work. 


We need to introduce notation that will allow us to represent a ma¬ 
trix in terms of its columns. If A is an n-by-n matrix 


A = 


a-1,1 


O-l,n 


On,l ■ ■ ■ O n , n 

then we can think of the k th column of A as an n-by-1 matrix 

Oi,lc 


Ok = 


On,k 


We will write A in the form 


\ o\ ... o n ], 

with the understanding that cik denotes the k th column of A. With this 
notation, note that Oj,k, with two subscripts, denotes an entry of A, 
whereas ak, with one subscript, denotes a column of A. 

The next lemma shows that a permutation of the columns of a matrix 
changes the determinant by a factor of the sign of the permutation. 


10.30 Lemma: Suppose A = [a \ ... a n 1 is an n-by-n matrix. 

If (mi,..., m n ) is a permutation, then 

det[ Om i . . . O mn 1 = (sign(mi,..., m n )) det A. 


Proof: Suppose E permn. We can transform the 

matrix [ a mi ... a mn ] into A through a series of steps. In each 
step, we interchange two columns and hence multiply the determinant 
by -1 (see 10.28). The number of steps needed equals the number 

of steps needed to transform the permutation (mi.m„) into the 

permutation (1,..., n) by interchanging two entries in each step. The 
proof is completed by noting that the number of such steps is even if 
(mi,..., m n ) has sign 1, odd if (mi,..., m n ) has sign -1 (this follows 
from 10.23, along with the observation that the permutation (1,..., n ) 
has sign 1). ■ 

Let A = f fli ... o n ]. For 1 < k < n, think of all columns of A 
except the k th column as fixed. We have 
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detA = det[ a\ ... a k ... a n ], 

and we can think of detA as a function of the k th column a k ■ This 
function, which takes ak to the determinant above, is a linear map 
from the vector space of rz-by-1 matrices with entries in F to F. The 
linearity follows easily from 10.2 5, where each term in the sum contains 
precisely one entry from the k th column of A. 

Now we are ready to prove one of the key properties about determi¬ 
nants of square matrices. This property will enable us to connect the 
determinant of an operator with the determinant of its matrix. Note 
that this proof is considerably more complicated than the proof of the 
corresponding result about the trace (see 10.9). 

10.B 1 Theorem: If A and B are square matrices of the same size, This theorem was first 
then proved in 1812 by the 

det(AJS) = det (BA) = (detA)(det5). French mathematicians 

Jacques Binet and 

PROOF: LetA=[ a\ ... a n 1, where each ak is an n -by-1 column Augustin-Louis Cauchy. 

of A. Let 

bl,l ■■■ bl,n 

B = : : = [ hi ... b n ], 

bn,l ■ ■ ■ bn,n 

where each bk is an n-by-1 column of B. Let Ck denote the n-by-1 matrix 
that equals 1 in the k th row and 0 elsewhere. Note that Aek = ak and 
Be k = b k . Furthermore, b k = ZLi b m ,k^m- 

First we will prove that det(A5) = (detA)(det£). A moment’s 
thought about the definition of matrix multiplication shows that AB = 

[ Ab\ ... Ab n ]. Thus 

det(AU) = det[ Ab\ ... Ab n ] 

= det[ ACZmi = lbmi,l^m\) ■■■ J ^CE,yn n = lbm n ,nttm n ) 1 

= det[ Emi=i b mi ,1 Ae mi ... 'Z,m n = lbm n ,n Ae mn ] 
n n 

=!■■■! bmi,\ ■ ■ ■ bm n ,n det[ Ae mi ... Ae m „ ], 

mi = l m n = 1 

where the last equality comes from repeated applications of the linear¬ 
ity of det as a function of one column at a time. In the last sum above, 
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all terms in which m,j = mk for some j f k can be ignored because the 
determinant of a matrix with two equal columns is 0 (by 10.29). Thus 
instead of summing over all mi,..., m n with each m, taking on values 
1,..., n, we can sum just over the permutations, where the m/s have 
distinct values. In other words, 


det(AB) = Y, V,i" 

(mi .m„)epermn 

■ ■ ^ m n ,n det[ A.Cfni 

Ae mn ] 

= h mi i. 


.,m n )) detA 


(mi .m„)epermn 


= (detA) Y (sign (mi,..., m M ))b mi ,i... b mn , n 

(mi . m n )epermn 

= (detA)(det5), 

where the second equality comes from 10.30. 

In the paragraph above, we proved that det(AfJ) = (detA)(detJ3). 
Interchanging the roles of A and B, we have det(HA) = (detHHdetA). 
The last equation can be rewritten as det(BA) = (det AMdetH), com¬ 
pleting the proof. ■ 

Now we can prove that the determinant of the matrix of an oper¬ 
ator is independent of the basis with respect to which the matrix is 
computed. 

10.32 Corollary: Suppose T e £(V). If (ui, ... ,u n ) and (vi,.. .,v n ) 
are bases of V, then 

det M(T, (ui,...,u n )) = detM(T, (vi,...,v n )). 


Note the similarity of 
this proof to the proof 
of the analogous result 
about the trace 
(see 10.10). 


Proof: Suppose (ui,...,u n ) and (vi,...,v n ) are bases of V. Let 
A = (vi,...,v n )). Then 

det M(T, (ui,...,u n )) = det^A” 1 (M(T, (vi,...,v n ))A)) 

= det(vi,...,v n ))A)A _1 ) 

= det!M(r, (Vi,...,v n )), 


where the first equality follows from 10.3 and the second equality fol¬ 
lows from 10.31. The third equality completes the proof. ■ 
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The theorem below states that the determinant of an operator equals 
the determinant of the matrix of the operator. This theorem does not 
specify a basis because, by the corollary above, the determinant of the 
matrix of an operator is the same for every choice of basis. 

10.33 Theorem: If T e L(V), then det T = detM(T). 

Proof: Let T e £(V). As noted above, i0.32 implies that det M(T) 
is independent of which basis of V we choose. Thus to show that 

detT = det 31 (T) 

for every basis of V, we need only show that the equation above holds 
for some basis of V. We already did this (on page 230), choosing a basis 
of V with respect to which M(T) is an upper-triangular matrix (if V is a 
complex vector space) or an appropriate block upper-triangular matrix 
(if V is a real vector space). ■ 

If we know the matrix of an operator on a complex vector space, the 
theorem above allows us to find the product of all the eigenvalues with¬ 
out finding any of the eigenvalues. For example, consider the operator 
on C 5 whose matrix is 

" 0 0 0 0 -3 " 

1 0 0 0 6 

0 10 0 0 
0 0 10 0 

0 0 0 1 0 

No one knows an exact formula for any of the eigenvalues of this opera¬ 
tor. However, we do know that the product of the eigenvalues equals - 3 
because the determinant of the matrix above equals -3. 

The theorem above also allows us easily to prove some useful prop¬ 
erties about determinants of operators by shifting to the language of 
determinants of matrices, where certain properties have already been 
proved or are obvious. We carry out this procedure in the next corol¬ 
lary. 

10.34 Corollary: If S,T e L(V), then 


det(ST) = det(TS) = (detSMdet T). 
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Most applied 
mathematicians agree 
that determinants 
should rarely be used 
in serious numeric 
calculations. 


Proof: Suppose S, T G £(V). Choose any basis of V. Then 

det(ST) = det M(ST) 

= det (M(S)M(T)) 

= (detM(S)) (detM(T)) 

= (detSHdet T), 

where the first and last equalities come from 10.33 and the third equal¬ 
ity comes from 10.31. 

In the paragraph above, we proved that det(ST) = (detSHdet T). In¬ 
terchanging the roles of S and T, we have det(TS) = (det D(detS). Be¬ 
cause multiplication of elements of F is commutative, the last equation 
can be rewritten as det(TS) = (detSHdet T), completing the proof. ■ 

yoCume 

We proved the basic results of linear algebra before introducing de¬ 
terminants in this final chapter. Though determinants have value as a 
research tool in more advanced subjects, they play little role in basic 
linear algebra (when the subject is done right). Determinants do have 
one important application in undergraduate mathematics, namely, in 
computing certain volumes and integrals. In this final section we will 
use the linear algebra we have learned to make clear the connection 
between determinants and these applications. Thus we will be dealing 
with a part of analysis that uses linear algebra. 

We begin with some purely linear algebra results that will be use¬ 
ful when investigating volumes. Recall that an isometry on an inner- 
product space is an operator that preserves norms. The next result 
shows that every isometry has determinant with absolute value 1. 

10.35 Proposition: Suppose that V is an inner-product space. If 
S G L(V) is an isometry, then |detS| = 1. 

Proof: Suppose S g £(V) is an isometry. First consider the case 
where V is a complex inner-product space. Then all the eigenvalues of 5 
have absolute value 1 (by 7.37). Thus the product of the eigenvalues 
of S, counting multiplicity, has absolute value one. In other words, 

| det 51 = 1, as desired. 
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Now suppose V is a real inner-product space. Then there is an ortho¬ 
normal basis of V with respect to which S has a block diagonal matrix, 
where each block on the diagonal is a 1-by-l matrix containing 1 or -1 
or a 2-by-2 matrix of the form 


10.36 


cos 9 - sin 9 
sind cosd ’ 


with 9 G (0, tt) (see 7.38). Note that the constant term of the charac¬ 
teristic polynomial of each matrix of the form 10.36 equals 1 (because 
cos 2 9 + sin 2 9 = 1). Hence the second coordinate of every eigenpair 
of S equals 1. Thus the determinant of S is the product of l’s and -l’s. 
In particular, |detS| = 1, as desired. ■ 


Suppose V is a real inner-product space and S G £ ( V ) is an isometry. 
By the proposition above, the determinant of S equals 1 or -1. Note 
that 

{v G V : Sv = —v} 

is the subspace of V consisting of all eigenvectors of S corresponding 
to the eigenvalue -1 (or is the subspace {0} if -1 is not an eigenvalue 
of 5). Thinking geometrically, we could say that this is the subspace 
on which 5 reverses direction. A careful examination of the proof of 
the last proposition shows that detS = 1 if this subspace has even 
dimension and det 5 = -1 if this subspace has odd dimension. 

A self-adjoint operator on a real inner-product space has no eigen- 
pairs (by 7.11). Thus the determinant of a self-adjoint operator on a 
real inner-product space equals the product of its eigenvalues, count¬ 
ing multiplicity (of course, this holds for any operator, self-adjoint or 
not, on a complex vector space). 

Recall that if V is an inner-product space and T G £(V), then T*T 
is a positive operator and hence has a unique positive square root, de¬ 
noted VT*T (see 7.27 and 7.28). Because \JT*T is positive, all its eigen¬ 
values are nonnegative (again, see 7.27), and hence its determinant is 
nonnegative. Thus in the corollary below, taking the absolute value of 
det \JT*T would be superfluous. 


10.37 Corollary: Suppose V is an inner-product space. If T G £(V), 
then 


| det T | = det -JT*T. 
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Another proof of this 
corollary is suggested 
in Exercise 24 in this 
chapter. 


Proof: Suppose T G £(V). By the polar decomposition (7.41), there 
is an isometry S G £(V) such that 

T = SVT*T. 


Thus 


|det T | = |detS| det VT*T 
= det Vf*T, 

where the hrst equality follows from 10.34 and the second equality 
follows from 10.35. ■ 


We are not formally 
defining the phrase 
“reverses direction” 
because these 
comments are meant to 
be an intuitive aid to 
our understanding, not 
rigorous mathematics. 


Suppose V is a real inner-product space and T G £(V ) is invertible. 
The det T is either positive or negative. A careful examination of the 
proof of the corollary above can help us attach a geometric meaning 
to whichever of these possibilities holds. To see this, hrst apply the 
real spectral theorem (7.13) to the positive operator y / r*T, getting an 
orthonormal basis (ei,...,e n ) of V such that ^T*Tej = A ; e/, where 
Ai,...,A n are the eigenvalues of yT*T, repeated according to multi¬ 
plicity. Because each A j is positive, \/T*T never reverses direction. 
Now consider the polar decomposition 

T = S^T^T, 

where 5 G £(V) is an isometry. ThendetT = (detSMdet %/T*T). Thus 
whether det T is positive or negative depends on whether det 5 is pos¬ 
itive or negative. As we saw earlier, this depends on whether the space 
on which S reverses direction has even or odd dimension. Because 
T is the product of 5 and an operator that never reverses direction 
(namely, vT*T), we can reasonably say that whether det T is positive 
or negative depends on whether T reverses vectors an even or an odd 
number of times. 

Now we turn to the question of volume, where we will consider only 
the real inner-product space R" (with its standard inner product). We 
would like to assign to each subset Q of R" its rz-dimensional volume, 
denoted volume Q (when n = 2, this is usually called area instead of 
volume). We begin with cubes, where we have a good intuitive notion of 
volume. The cube inR rt with side length r and vertex (x \,..., x n ) G R' 1 
is the set 
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{{yi, ■ ■ ■ ,y n ) e R n : Xj < yj < Xj + r for j = 1,..., n}; 

you should verify that when n = 2, this gives a square, and that when 
n = 3, it gives a familiar three-dimensional cube. The volume of a cube 
in R' 1 with side length r is defined to be r n . To define the volume of 
an arbitrary set Q c R”, the idea is to write Q as a subset of a union of 
many small cubes, then add up the volumes of these small cubes. As 
we approximate Q more accurately by unions (perhaps infinite unions) 
of small cubes, we get a better estimate of volume Q. 

Rather than take the trouble to make precise this definition of vol¬ 
ume, we will work only with an intuitive notion of volume. Our purpose 
in this book is to understand linear algebra, whereas notions of volume 
belong to analysis (though as we will soon see, volume is intimately con¬ 
nected with determinants). Thus for the rest of this section we will rely 
on intuitive notions of volume rather than on a rigorous development, 
though we shall maintain our usual rigor in the linear algebra parts 
of what follows. Everything said here about volume will be correct— 
the intuitive reasons given here can be converted into formally correct 
proofs using the machinery of analysis. 

For T E L(V) and Q c R n , define T(D.) by 

T(Q) = {Tx : x E Q}. 

Our goal is to find a formula for the volume of T(ll) in terms of T 
and the volume of Q. First let’s consider a simple example. Suppose 
Ai,...,A n are positive numbers. Define T E XIR' 1 ) by T(xi,...,x n ) = 
(AiXi,..., A n x n )- If O is a cube in R n with side length r, then T(D.) 
is a box in R n with sides of length Air,..., A n r. This box has volume 
Ai... A n r n , whereas the cube Q has volume r n . Thus this particular T, 
when applied to a cube, multiplies volumes by a factor of Ai...A n , 
which happens to equal det T. 

As above, assume that Ai,..., A n are positive numbers. Now sup¬ 
pose that (ei ,..., e n ) is an orthonormal basis of R" and T is the op¬ 
erator on R 11 that satisfies Tej = A ; e ; for j = 1,..., n. In the special 
case where (ei,...,e n ) is the standard basis of R f! , this operator is the 
same one as defined in the paragraph above. Even for an arbitrary or¬ 
thonormal basis (ei,...,e n ), this operator has the same behavior as 
the one in the paragraph above—it multiplies the j th basis vector by 
a factor of A j. Thus we can reasonably assume that this operator also 
multiplies volumes by a factor of Ai... A n , which again equals det T. 


Readers familiar with 
outer measure will 
recognize that concept 
here. 
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We need one more ingredient before getting to the main result in 
this section. Suppose S e £(R n ) is an isometry. For x,y e R n , we 
have 


||5x - S;y|| = ||5 (x - y) || 

= llx - y\\. 

In other words, S does not change the distance between points. As you 
can imagine, this means that S does not change volumes. Specifically, 
if Q c R n , then volume 5(Q) = volume Q. 

Now we can give our pseudoproof that an operator T e £ (R M ) 
changes volumes by a factor of |det T\. 

10.38 Theorem: If T e £(R n ), then 

volumeT(Q) = | det £| (volume Q) 


for QcR". 

Proof: First consider the case where T e £(R n ) is a positive 
operator. Let Ai,...,A n be the eigenvalues of T, repeated according 
to multiplicity. Each of these eigenvalues is a nonnegative number 
(see 7.27). By the real spectral theorem (7.13), there is an orthonormal 

basis (ei. e n ) of V such that Tej = A jej for each j. As discussed 

above, this implies that T changes volumes by a factor of det T. 

Now suppose T e £(R") is an arbitrary operator. By the polar de¬ 
composition (7.41), there is an isometry S e L(V) such that 

T = SVT*T. 

If O c R'\ then T(Q) = S(yT*T(Q)). Thus 

volumeT(Q) = volume5(v / r*T(Q)) 

= volume yT*T(0) 

= (det VT* T) (volume O) 

= | det T | (volume Q), 

where the second equality holds because volumes are not changed by 
the isometry 5 (as discussed above), the third equality holds by the 
previous paragraph (applied to the positive operator V"T*T), and the 
fourth equality holds by 10.37. ■ 



Volume 


241 


The theorem above leads to the appearance of determinants in the 
formula for change of variables in multivariable integration. To de¬ 
scribe this, we will again be vague and intuitive. If Q c R" and f is 
a real-valued function (not necessarily linear) on Q, then the integral 
of / over Q, denoted J n f or J n /(x) dx, is defined by breaking Q into 
pieces small enough so that / is almost constant on each piece. On 
each piece, multiply the (almost constant) value of / by the volume of 
the piece, then add up these numbers for all the pieces, getting an ap¬ 
proximation to the integral that becomes more accurate as we divide 
Cl into finer pieces. Actually Cl needs to be a reasonable set (for ex¬ 
ample, open or measurable) and f needs to be a reasonable function 
(for example, continuous or measurable), but we will not worry about 
those technicalities. Also, notice that the x in Jn/(x) dx is a dummy 
variable and could be replaced with any other symbol. 

Fix a set Q c R' 1 and a function (not necessarily linear) cr: Q — R' 1 . 
We will use cr to make a change of variables in an integral. Before we 
can get to that, we need to define the derivative of cr, a concept that 
uses linear algebra. For x G Cl, the derivative of cr at x is an operator 
T G £(R n ) such that 

\\a(x + y) - a(x) - Ty\\ 

lim-—„-= 0. 

r-o lb'll 

If an operator T G £( R") exists satisfying the equation above, then 
cr is said to be differentiable at x. If cr is differentiable at x, then 
there is a unique operator T G £(R") satisfying the equation above 
(we will not prove this). This operator T is denoted cr'(x). Intuitively, 
the idea is that for x fixed and \\y\\ small, a good approximation to 
a{x + y) is cr(x) + (a'(x))(y) (note that cr'(x) e £(R n ), so this makes 
sense). Note that for x fixed the addition of the term cr(x) does not 
change volumes. Thus if F is a small subset of Cl containing x, then 
volume cr(D is approximately equal to volume(cr'(x))(F). 

Because cr is a function from O to R n , we can write 


If n = 1, then the 
derivative in this sense 
is the operator on R of 
multiplication by the 
derivative in the usual 
sense of one-variable 
calculus. 


cr(x) = (cr 1 (x),...,cr n (x)), 


where each crj is a function from Cl to R. The partial derivative of cr ; - 
with respect to the k th coordinate is denoted Dk<Jj. Evaluating this 
partial derivative at a point x G Cl gives Dk<Jj(x). If cr is differentiable 
at x, then the matrix of cr' (x) with respect to the standard basis of R” 
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contains Dk<Jj(x) in row j, column k (we will not prove this). In other 
words, 


10.39 M(a'(x)) 


Diai(x) ... DnCTl(x) 


DiCTn(x) ... D n a n (x ) 


Suppose that cr is differentiable at each point of Cl and that cr is 
injective on Q. Let / be a real-valued function defined on cr(Q). Let 
x G Cl and let L be a small subset of Cl containing x. As we noted above, 


volumecr(D « volume(cr'(x))(L), 


where the symbol « means “approximately equal to”. Using 10.38, this 
becomes 

volumecr(r) « |detcr'(x)|(volumer). 

Let y = (t(x ). Multiply the left side of the equation above by f(y ) and 
the right side by /(cr(x)) (because y = cr(x), these two quantities are 
equal), getting 

10.40 f(y) volume cr (D « /(cr(x))|detcr'(x)|(volumer). 


Now divide Cl into many small pieces and add the corresponding ver¬ 
sions of 10.40, getting 

10.41 f f(y) dy = f /(cr(x))|detcr'(x)| dx. 

Jcr(n) Jn 


If you are not familiar 
with polar and 
spherical coordinates, 
skip the remainder of 
this section. 


This formula was our goal. It is called a change of variables formula 
because you can think of y = cr(x) as a change of variables. 

The key point when making a change of variables is that the factor 
of |deter'(x) | must be included, as in the right side of 10.41. We finish 
up by illustrating this point with two important examples. When n = 2, 
we can use the change of variables induced by polar coordinates. In this 
case cr is defined by 


cr(r,d) = (r cos 6, r sin 6), 

where we have used r, 6 as the coordinates instead of xi, X 2 for reasons 
that will be obvious to everyone fa mi liar with polar coordinates (and 
will be a mystery to everyone else). For this choice of cr, the matrix of 
partial derivatives corresponding to 10.39 is 
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cos 9 -rsind 
sin 9 r cos 9 ’ 

as you should verify. The determinant of the matrix above equals r, 
thus explaining why a factor of r is needed when computing an integral 
in polar coordinates. 

Finally, when n = 3, we can use the change of variables induced by 
spherical coordinates. In this case a is defined by 

cr(p,qp,9) = (p sin qp cos 9, p sin qp sind, p cos qp), 

where we have used p,9,qp as the coordinates instead of Xi,X 2 ,x$ 
for reasons that will be obvious to everyone familiar with spherical 
coordinates (and will be a mystery to everyone else). For this choice 
of a, the matrix of partial derivatives corresponding to 10.39 is 

sin qp cos 9 p cos qp cos 9 -psincpsind 
sin qp sin 9 p cos qp sin 9 psincpcosd 
cos qp -p sirup 0 

as you should verify. You should also verify that the determinant of the 
matrix above equals p 2 sin qp thus explaining why a factor of p 2 sin cp 
is needed when computing an integral in spherical coordinates. 
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Txercises 

1. Suppose T e £{V) and (vi,...,v n ) is a basis of V. Prove that 
M(T, (vi, ... ,v n )) is invertible if and only if T is invertible. 

2. Prove that if A and B are square matrices of the same size and 
AB = I, then BA = I. 

3. Suppose T G L(V) has the same matrix with respect to every ba¬ 
sis of V. Prove that T is a scalar multiple of the identity operator. 

4. Suppose that (ui,...,u n ) and (vi,...,v n ) are bases of V. Let 
T G £(V) be the operator such that Tvk = Uk for k = 1,..., n. 
Prove that 

M{T, (vi,...,v„)) = M(I, (ui,...,u n ), (vi,...,v M )). 

5. Prove that if B is a square matrix with complex entries, then there 
exists an invertible square matrix A with complex entries such 
that A~ l BA is an upper-triangular matrix. 

6. Give an example of a real vector space V and T G £(V) such that 
trace! T 2 ) < 0. 

7. Suppose V is a real vector space, T G £(V), and V has a basis 
consisting of eigenvectors of T. Prove that trace (T 2 ) > 0. 

8. Suppose V is an inner-product space and v,w G £(V). Define 
T G £{V) by Tu = {u,v)w. Find a formula for trace T. 

9. Prove that if P e £(V) satisfies P 2 = P, then traceP is a nonneg¬ 
ative integer. 

10. Prove that if V is an inner-product space and T G £(V), then 

trace T* = trace T. 

11. Suppose V is an inner-product space. Prove that if T G £(V) is 
a positive operator and trace T = 0, then T = 0. 
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12. Suppose T G £(C 3 ) is the operator whose matrix is 

' 51 -12 -21 “ 

60 -40 -28 . 

57 -68 1 

Someone tells you (accurately) that -48 and 24 are eigenvalues 
of T. Without using a computer or writing anything down, find 
the third eigenvalue of T. 

13. Prove or give a counterexample: if T G £(V) and c G F, then 
trace(cT) = c tracer. 

14. Prove or give a counterexample: if S,T G £(V), then trace(ST) = 
(trace 5) (trace T). 

15. Suppose T G £(V). Prove that if trace(5D = 0 for all S G £(V), 
then T = 0. 

16. Suppose V is an inner-product space and T G £(V). Prove that 
if (ei,...,e n ) is an orthonormal basis of V, then 

trace(r*D = ||rei|| 2 + ■ ■ ■ + ||re„|| 2 . 

Conclude that the right side of the equation above is independent 
of which orthonormal basis (e\,e n ) is chosen for V. 

17. Suppose V is a complex inner-product space and T G £{V). Let 
Ai,..., A n be the eigenvalues of T , repeated according to multi¬ 
plicity. Suppose 

0-1,1 ■■■ Oi , n 

On, 1 ■ ■ ■ O ntn 

is the matrix of T with respect to some orthonormal basis of V. 
Prove that 

n n 

lAil 2 + - ■ ■ + |A„| 2 < X Z I^J.fel 2 - 
k I /-I 

18. Suppose V is an inner-product space. Prove that 

(S, T) = trace(5T*) 


dehnes an inner product on £(V). 
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Chapter 10. Trace and Determinant 


Exercise 19 fails on 
infini te-dimensional 
inner-product spaces, 
leading to what are 
called hyponormal 
operators, which have a 
well-developed theory. 


19. 


Suppose V is an inner-product space and T G £(V). Prove that 
if 

lir*v|| < lirvii 


for every v G V, then T is normal. 


20. Prove or give a counterexample: if T G L(V) and c e F, then 
det(cT) = c dimV detT. 


21. Prove or give a counterexample: if S, T e £(V), then det(S + T) = 
detS + detT. 


22. Suppose A is a block upper-triangular matrix 

Ai 4= 


A = 


0 


where each Aj along the diagonal is a square matrix. Prove that 


detA = (detAi)... (det A r , 


23. Suppose A is an n-by-n matrix with real entries. Let S G £(C n ) 
denote the operator on C n whose matrix equals A, and let T G 
£(R”) denote the operator on R ' 1 whose matrix equals A. Prove 
that traces = trace T and detS = det T. 

24. Suppose V is an inner-product space and T G £(V). Prove that 

det T* = det T. 


Use this to prove that |detT| = det -JT*T, giving a different 
proof than was given in 10.37. 


25. Let a, b, c be positive numbers. Find the volume of the ellipsoid 


x 2 y 2 z 2 

{(x,y,z) g R‘ ■ ~^2 + ~^2 + ~^2 < 


by finding a set Q c R 3 whose volume you know and an operator 
T G £(R 3 ) such that T(Q) equals the ellipsoid above. 
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